Loading…
Synthesizing external aggregated information in the presence of population heterogeneity: A penalized empirical likelihood approach
With the increasing availability of data in the public domain, there has been a growing interest in exploiting information from external sources to improve the analysis of smaller scale studies. An emerging challenge in the era of big data is that the subject‐level data are high dimensional, but the...
Saved in:
Published in: | Biometrics 2022-06, Vol.78 (2), p.679-690 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | With the increasing availability of data in the public domain, there has been a growing interest in exploiting information from external sources to improve the analysis of smaller scale studies. An emerging challenge in the era of big data is that the subject‐level data are high dimensional, but the external information is at an aggregate level and of a lower dimension. Moreover, heterogeneity and uncertainty in the auxiliary information are often not accounted for in information synthesis. In this paper, we propose a unified framework to summarize various forms of aggregated information via estimating equations and develop a penalized empirical likelihood approach to incorporate such information in logistic regression. When the homogeneity assumption is violated, we extend the method to account for population heterogeneity among different sources of information. When the uncertainty in the external information is not negligible, we propose a variance estimator adjusting for the uncertainty. The proposed estimators are asymptotically more efficient than the conventional penalized maximum likelihood estimator and enjoy the oracle property even with a diverging number of predictors. Simulation studies show that the proposed approaches yield higher accuracy in variable selection compared with competitors. We illustrate the proposed methodologies with a pediatric kidney transplant study. |
---|---|
ISSN: | 0006-341X 1541-0420 |
DOI: | 10.1111/biom.13429 |