Abstract
Compared to constraint-based causal discovery, causal discovery based on functional causal models is able to identify the whole causal model under appropriate assumptions [Shimizu et al. 2006; Hoyer et al. 2009; Zhang and Hyvärinen 2009b]. Functional causal models represent the effect as a function of the direct causes together with an independent noise term. Examples include the linear non-Gaussian acyclic model (LiNGAM), nonlinear additive noise model, and post-nonlinear (PNL) model. Currently, there are two ways to estimate the parameters in the models: dependence minimization and maximum likelihood. In this article, we show that for any acyclic functional causal model, minimizing the mutual information between the hypothetical cause and the noise term is equivalent to maximizing the data likelihood with a flexible model for the distribution of the noise term. We then focus on estimation of the PNL causal model and propose to estimate it with the warped Gaussian process with the noise modeled by the mixture of Gaussians. As a Bayesian nonparametric approach, it outperforms the previous one based on mutual information minimization with nonlinear functions represented by multilayer perceptrons; we also show that unlike the ordinary regression, estimation results of the PNL causal model are sensitive to the assumption on the noise distribution. Experimental results on both synthetic and real data support our theoretical claims.
- P. J. Bickel and K. A. Doksum. 1981. An analysis of transformations revisited. Journal of the American Statistical Association 76, 296--311.Google ScholarCross Ref
- T. M. Cover and J. A. Thomas. 1991. Elements of Information Theory. Wiley. Google ScholarDigital Library
- A. Gretton, K. Fukumizu, C. H. Teo, L. Song, B. Schölkopf, and A. J. Smola. 2008. A kernel statistical test of independence. In Advances in Neural Information Procssing Systems 20. MIT Press, Cambridge, MA, 585--592.Google Scholar
- P. O. Hoyer, D. Janzing, J. Mooij, J. Peters, and B. Schölkopf. 2009. Nonlinear causal discovery with additive noise models. In Advances in Neural Information Processing Systems 21.Google Scholar
- A. Hyvärinen, J. Karhunen, and E. Oja. 2001. Independent Component Analysis. John Wiley & Sons.Google Scholar
- A. Hyvärinen and P. Pajunen. 1999. Nonlinear independent component analysis: Existence and uniqueness results. Neural Networks 12, 3, 429--439. Google ScholarDigital Library
- D. Janzing, J. Mooij, K. Zhang, J. Lemeire, J. Zscheischler, P. Daniuvsis, B. Steudel, and B. Schölkopf. 2012. Information-geometric approach to inferring causal directions. Artificial Intelligence 182--183, 1--31. Google ScholarDigital Library
- R. A. Levine and G. Casella. 2001. Implementations of the Monte Carlo EM algorithm. Journal of Computational and Graphical Statistics 10, 3, 422--439.Google ScholarCross Ref
- J. Mooij, D. Janzing, J. Peters, and B. Schölkopf. 2009. Regression by dependence minimization and its application to causal inference in additive noise models. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML’09). 745--752. Google ScholarDigital Library
- J. Mooij, O. Stegle, D. Janzing, K. Zhang, and B. Schölkopf. 2010. Probabilistic latent variable models for distinguishing between cause and effect. In Advances in Neural Information Processing Systems 23.Google Scholar
- J. Pearl. 2000. Causality: Models, Reasoning, and Inference. Cambridge University Press, Cambridge, MA. Google ScholarDigital Library
- B. Schölkopf, D. Janzing, J. Peters, E. Sgouritsa, K. Zhang, and J. Mooij. 2012. On causal and anticausal learning. In Proceedings of the 29th International Conference on Machine Learning (ICML’12).Google Scholar
- S. Shimizu, P. O. Hoyer, A. Hyvärinen, and A. J. Kerminen. 2006. A linear non-Gaussian acyclic model for causal discovery. Journal of Machine Learning Research 7, 2003--2030. Google ScholarDigital Library
- E. Snelson, C. E. Rasmussen, and Z. Ghahramani. 2004. Warped Gaussian processes. In Advances in Neural Information Processing Systems 16.Google Scholar
- P. Spirtes, C. Glymour, and R. Scheines. 2001. Causation, Prediction, and Search (2nd ed.). MIT Press, Cambridge, MA.Google Scholar
- A. Taleb and C. Jutten. 1999. Source separation in post-nonlinear mixtures. IEEE Transactions on Signal Processing 47, 10, 2807--2820. Google ScholarDigital Library
- M. Yamada and M. Sugiyama. 2010. Dependence minimizing regression with model selection for non-linear causal inference under non-Gaussian noise. In Proceedings of the 24th AAAI Conference on Artificial Intelligence (AAAI’10). 643--648.Google Scholar
- K. Zhang and L. Chan. 2005. Extended Gaussianization method for blind separation of post-nonlinear mixtures. Neural Computation 17, 2, 425--452. Google ScholarDigital Library
- K. Zhang and A. Hyvärinen. 2009a. Acyclic causality discovery with additive noise: An information-theoretical perspective. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD’09).Google Scholar
- K. Zhang and A. Hyvärinen. 2009b. On the identifiability of the post-nonlinear causal model. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence. Google ScholarDigital Library
- K. Zhang, J. Peters, D. Janzing, and B. Schölkopf. 2011. Kernel-based conditional independence test and application in causal discovery. In Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI’11).Google Scholar
- K. Zhang, B. Schölkopf, K. Muandet, and Z. Wang. 2013a. Domain adaptation under target and conditional shift. In Proceedings of the 30th International Conference on Machine Learning.Google Scholar
- K. Zhang, Z. Wang, and B. Schölkopf. 2013b. On estimation of functional causal models: Post-nonlinear causal model as an example. In Proceedings of the IEEE 13th International Conference on Data Mining Workshops. 139--146. Google ScholarDigital Library
Index Terms
- On Estimation of Functional Causal Models: General Results and Application to the Post-Nonlinear Causal Model
Recommendations
On Estimation of Functional Causal Models: Post-Nonlinear Causal Model as an Example
ICDMW '13: Proceedings of the 2013 IEEE 13th International Conference on Data Mining WorkshopsCompared to constraint-based causal discovery, causal discovery based on functional causal models is able to identify the whole causal model under appropriate assumptions. Functional causal models represent the effect as a function of the direct causes ...
Causal Discovery via Causal Star Graphs
Discovering causal relationships among observed variables is an important research focus in data mining. Existing causal discovery approaches are mainly based on constraint-based methods and functional causal models (FCMs). However, the constraint-based ...
Coresets for fast causal discovery with the additive noise model
AbstractCausal discovery reveals the true causal relationships behind data and discovering causal relationships from observed data is a particularly challenging problem, especially in large-scale datasets. The functional causal model is an effective ...
Highlights- New coresets proposed for the additive noise model greatly reduces the data size for causal discovery.
- A time-efficient algorithm, FANM, is proposed for causal discovery based on the coresets.
- The coreset construction is applied to ...
Comments