E-Locus - Institutional Repository of the University of Crete - Learning causal structure from overlapping variable sets

Home Learning causal structure from overlapping variable sets

Results - Details

[Add to Basket]

Identifier

000356936

http://elocus.lib.uoc.gr

Title

Learning causal structure from overlapping variable sets

Alternative Title

Εκμάθηση αιτιακής δομής από αλληλοεπικαλυπτόμενα σύνολα μεταβλητών

Author

Τριανταφύλλου, Σοφία

Thesis advisor

Τόλλης, Ιωάννης Γ.

Abstract

Modern data-analysis methods are typically applicable to a single dataset. In particular, they cannot integratively analyze datasets containing different, but overlapping, sets of variables. We show that by employing causal models instead of models based on the concept of association alone, it is possible to make additional interesting inferences by integrative analysis than by independent analysis of each dataset. Specifically, we assume that all datasets are generated by the a single overarching causal model representable by a Maximal Ancestral Graph; Maximal Ancestral Graphs are a class of graphical independence models designed to model marginal distributions and cope with causal insufficiency (latent confounding variables). We rigorously define the problem of identifying one or all causal models that simultaneously fit the available data. We propose a novel algorithm FCM that converts this problem to a SAT formula whose solutions correspond to all plausible causal models. We also introduce a new kind of graphical model, the Pairwise Causal Graph (PCG), that succinctly summarizes all pairwise causal relations among the variables. Based on FCM, we propose cSAT+, an algorithm that outputs the PCG when given a set of datasets and prove that this algorithm is sound and complete in the absence of statistical errors. In our empirical evaluation on simulated datasets, we show that the integrative analysis using cSAT+ makes more sound causal inferences than by analyzing the datasets in isolation. Examples of interesting inferences include the induction of the absence or the presence of some kind of causal relation between two variables never measured together. The latter observation has significant ramifications for data analysis as it implies that additional causal

Language

English

Issue date

2010-01-18

Collection

School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses

Type of Work--Post-graduate theses

Views

567

Digital Documents
	Download document View document Views : 20