Abstract |
Modern data-analysis methods are typically applicable to a single dataset. In particular, they cannot integratively analyze datasets containing different, but overlapping, sets of variables. We show that by employing causal models instead of models based on the concept of association alone, it is possible to make additional interesting inferences by integrative analysis than by independent analysis of each dataset.
Specifically, we assume that all datasets are generated by the a single overarching causal model representable by a Maximal Ancestral Graph; Maximal Ancestral Graphs are a class of graphical independence models
designed to model marginal distributions and cope with causal insufficiency (latent confounding variables).
We rigorously define the problem of identifying one or all causal models
that simultaneously fit the available data. We propose a novel algorithm FCM that converts this problem to a SAT formula whose solutions correspond to all plausible causal models. We also introduce a new
kind of graphical model, the Pairwise Causal Graph (PCG), that succinctly
summarizes all pairwise causal relations among the variables. Based on FCM, we propose cSAT+, an algorithm that outputs the PCG when given a set of
datasets and prove that this algorithm is sound and complete in the absence of statistical errors. In our empirical evaluation on simulated datasets, we show that the integrative analysis using cSAT+ makes more sound causal inferences than by analyzing the datasets in isolation. Examples of interesting inferences include the induction of the absence or the presence of some kind of causal relation between two variables never measured together. The latter observation
has significant ramifications for data analysis as it implies that additional causal
|