Loading…

Weakly Semi-supervised phenotyping using Electronic Health records

[Display omitted] •WSS-DL yields high phenotyping performance despite using very small numbers of labels.•WSS-DL combines the strengths of deep learning and weakly supervised learning.•Unlike weakly supervised learning algorithms fail for episodic phenotypes that cannot be consistently detected via...

Full description

Saved in:
Bibliographic Details
Published in:Journal of biomedical informatics 2022-10, Vol.134, p.104175-104175, Article 104175
Main Authors: Nogues, Isabelle-Emmanuella, Wen, Jun, Lin, Yucong, Liu, Molei, Tedeschi, Sara K., Geva, Alon, Cai, Tianxi, Hong, Chuan
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:[Display omitted] •WSS-DL yields high phenotyping performance despite using very small numbers of labels.•WSS-DL combines the strengths of deep learning and weakly supervised learning.•Unlike weakly supervised learning algorithms fail for episodic phenotypes that cannot be consistently detected via ICD and NLP data.•WSS-DL has the potential in aiding doctors to phenotype rare diseases as well as conditions susceptible to misdiagnosis. Electronic Health Record (EHR) based phenotyping is a crucial yet challenging problem in the biomedical field. Though clinicians typically determine patient-level diagnoses via manual chart review, the sheer volume and heterogeneity of EHR data renders such tasks challenging, time-consuming, and prohibitively expensive, thus leading to a scarcity of clinical annotations in EHRs. Weakly supervised learning algorithms have been successfully applied to various EHR phenotyping problems, due to their ability to leverage information from large quantities of unlabeled samples to better inform predictions based on a far smaller number of patients. However, most weakly supervised methods are subject to the challenge to choose the right cutoff value to generate an optimal classifier. Furthermore, since they only utilize the most informative features (i.e., main ICD and NLP counts) they may fail for episodic phenotypes that cannot be consistently detected via ICD and NLP data. In this paper, we propose a label-efficient, weakly semi-supervised deep learning algorithm for EHR phenotyping (WSS-DL), which overcomes the limitations above. WSS-DL classifies patient-level disease status through a series of learning stages: 1) generating silver standard labels, 2) deriving enhanced-silver-standard labels by fitting a weakly supervised deep learning model to data with silver standard labels as outcomes and high dimensional EHR features as input, and 3) obtaining the final prediction score and classifier by fitting a supervised learning model to data with a minimal number of gold standard labels as the outcome, and the enhanced-silver-standard labels and a minimal set of most informative EHR features as input. To assess the generalizability of WSS-DL across different phenotypes and medical institutions, we apply WSS-DL to classify a total of 17 diseases, including both acute and chronic conditions, using EHR data from three healthcare systems. Additionally, we determine the minimum quantity of training labels required by WSS-DL to outperform exi
ISSN:1532-0464
1532-0480
DOI:10.1016/j.jbi.2022.104175