Loading…

Essential Regression: A generalizable framework for inferring causal latent factors from multi-omic datasets

High-dimensional cellular and molecular profiling of biological samples highlights the need for analytical approaches that can integrate multi-omic datasets to generate prioritized causal inferences. Current methods are limited by high dimensionality of the combined datasets, the differences in thei...

Full description

Saved in:
Bibliographic Details
Published in:Patterns (New York, N.Y.) N.Y.), 2022-05, Vol.3 (5), p.100473-100473, Article 100473
Main Authors: Bing, Xin, Lovelace, Tyler, Bunea, Florentina, Wegkamp, Marten, Kasturi, Sudhir Pai, Singh, Harinder, Benos, Panayiotis V., Das, Jishnu
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:High-dimensional cellular and molecular profiling of biological samples highlights the need for analytical approaches that can integrate multi-omic datasets to generate prioritized causal inferences. Current methods are limited by high dimensionality of the combined datasets, the differences in their data distributions, and their integration to infer causal relationships. Here, we present Essential Regression (ER), a novel latent-factor-regression-based interpretable machine-learning approach that addresses these problems by identifying latent factors and their likely cause-effect relationships with system-wide outcomes/properties of interest. ER can integrate many multi-omic datasets without structural or distributional assumptions regarding the data. It outperforms a range of state-of-the-art methods in terms of prediction. ER can be coupled with probabilistic graphical modeling, thereby strengthening the causal inferences. The utility of ER is demonstrated using multi-omic system immunology datasets to generate and validate novel cellular and molecular inferences in a wide range of contexts including immunosenescence and immune dysregulation. [Display omitted] •ER is a novel interpretable machine-learning method for high-dimensional multi-omic data•ER outperforms a wide range of state-of-the-art methods in terms of prediction•Beyond prediction, ER identifies causal latent factors of groups/outcomes of interest•ER generated novel immunological inferences, consistent with evidence in model organisms Multi-omic technologies for deep cellular and molecular profiling from model organisms or humans have rapidly expanded. However, existing analytical approaches are constrained by the high dimensionality of these datasets, differences in data distributions, and the inability to generate causal inference beyond predictive biomarkers. To address these issues, we developed a novel interpretable machine-learning framework, Essential Regression (ER). ER integrates high-dimensional multi-omic datasets without distributional assumptions regarding the data and identifies significant latent factors and their causal relationships with system-wide outcomes/properties of interest. ER uses higher-order relationships encapsulated in the latent factors, rather than the individual observables, to home in on novel mechanistic insights. Our approach outperforms a range of state-of-the-art methods in terms of prediction and generates novel immunological inferences, consistent wit
ISSN:2666-3899
2666-3899
DOI:10.1016/j.patter.2022.100473