Your selected books provide a strong foundation, however, research in current causal representation learning (CRL) focuses on identifiable causal representations such as disentangled grouping variables and latent causal discovery such as nonlinear ICA variants.
CRL is severely ill-posed since it is a combination of the two notoriously ill-posed problems of representation learning and causal discovery. Yet, finding practical identifiability conditions that guarantee a unique solution is crucial for its practical applicability. Most approaches so far have been based on assumptions on the latent causal mechanisms, such as temporal causality, or existence of supervision or interventions; these can be too restrictive in actual applications. Here, we show identifiability based on novel, weak constraints, which requires no temporal structure, intervention, nor weak supervision. The approach is based on assuming the observational mixing exhibits a suitable grouping of the observational variables.
We consider the identifiability theory of probabilistic models and establish sufficient
conditions under which the representations learned by a very broad family of
conditional energy-based models are unique in function space, up to a simple
transformation. In our model family, the energy function is the dot-product between
two feature extractors, one for the dependent variable, and one for the conditioning
variable. We show that under mild conditions, the features are unique up to scaling
and permutation. Our results extend recent developments in nonlinear ICA, and
in fact, they lead to an important generalization of ICA models. In particular,
we show that our model can be used for the estimation of the components in
the framework of Independently Modulated Component Analysis (IMCA), a new
generalization of nonlinear ICA that relaxes the independence assumption. A
thorough empirical study shows that representations learned by our model from
real-world image datasets are identifiable, and improve performance in transfer
learning and semi-supervised learning tasks.
In summary CRL has expanded the applicability of both traditional ML and causal inference and highlighted the settings where strong causal assumptions on
weakly-labeled data may be attainable. You may also refer a recent CRL paper by Schölkopf et al (2021) "Towards Causal Representation Learning"
we note that most work in causality starts
from the premise that the causal variables are given. A central
problem for AI and causality is, thus, causal representation
learning, the discovery of high-level causal variables from lowlevel observations. Finally, we delineate some implications of
causality for machine learning and propose key research areas
at the intersection of both communities.