3

I'm interested in pursuing research in the intersection of causal inference and machine learning, particularly on causal discovery and causal representation learning. Through my exploration so far, I have found study of the following books is essential before reading research in this field.

  1. Strong ML foundations through books of Murphy and Bishop (can choose anyone)
  2. Understanding Machine Learning (Part 1) by Shai Ben David for theoretical ML background, usually referenced before presenting casual learning theory.
  3. Causality by Judea Pearl, for in-depth understanding of causal inference, followed by Elements of Causal Inference by Bernhard Scholkopf for causal discovery.

My questions are: Are these books sufficient for preparation of research in the topic? If not, what will you add to this list? What are some essential prerequisites to successfully complete these books? Such as Bayesian probability for causality? Or something else?

2 Answers2

3

Your selected books provide a strong foundation, however, research in current causal representation learning (CRL) focuses on identifiable causal representations such as disentangled grouping variables and latent causal discovery such as nonlinear ICA variants.

CRL is severely ill-posed since it is a combination of the two notoriously ill-posed problems of representation learning and causal discovery. Yet, finding practical identifiability conditions that guarantee a unique solution is crucial for its practical applicability. Most approaches so far have been based on assumptions on the latent causal mechanisms, such as temporal causality, or existence of supervision or interventions; these can be too restrictive in actual applications. Here, we show identifiability based on novel, weak constraints, which requires no temporal structure, intervention, nor weak supervision. The approach is based on assuming the observational mixing exhibits a suitable grouping of the observational variables.

We consider the identifiability theory of probabilistic models and establish sufficient conditions under which the representations learned by a very broad family of conditional energy-based models are unique in function space, up to a simple transformation. In our model family, the energy function is the dot-product between two feature extractors, one for the dependent variable, and one for the conditioning variable. We show that under mild conditions, the features are unique up to scaling and permutation. Our results extend recent developments in nonlinear ICA, and in fact, they lead to an important generalization of ICA models. In particular, we show that our model can be used for the estimation of the components in the framework of Independently Modulated Component Analysis (IMCA), a new generalization of nonlinear ICA that relaxes the independence assumption. A thorough empirical study shows that representations learned by our model from real-world image datasets are identifiable, and improve performance in transfer learning and semi-supervised learning tasks.

In summary CRL has expanded the applicability of both traditional ML and causal inference and highlighted the settings where strong causal assumptions on weakly-labeled data may be attainable. You may also refer a recent CRL paper by Schölkopf et al (2021) "Towards Causal Representation Learning"

we note that most work in causality starts from the premise that the causal variables are given. A central problem for AI and causality is, thus, causal representation learning, the discovery of high-level causal variables from lowlevel observations. Finally, we delineate some implications of causality for machine learning and propose key research areas at the intersection of both communities.

cinch
  • 11,000
  • 3
  • 8
  • 17
0

Beware of the DAG!

Note that both of the causality books you picked operate in a DAG-based framework. It may be interesting to read at least the introduction of some works operating in the Potential-Outcomes framework (e.g. Guido Imbens' "Causal inference in statistics, social, and biomedical sciences") to get an idea of their perspective.

As for the DAG framework, once you are a little familiar with the basic concepts, I would recommend the excellent Beware of the DAG! by Phil Dawid for a more critical exposition. The essay on Overthrowing the tyranny of null hypotheses hidden in causal diagrams by Sander Greenland could also be interesting in that regard.

Scriddie
  • 291
  • 2
  • 5