Loading…
Patient clustering improves efficiency of federated machine learning to predict mortality and hospital stay time using distributed electronic medical records
[Display omitted] •Non-identically independently distributed and private electronic medical records.•Distributed clustering of patients into clinically meaningful communities.•Community-based learning leaves all data and computation local on silos.•Performant algorithm of high predictive accuracy an...
Saved in:
Published in: | Journal of biomedical informatics 2019-11, Vol.99, p.103291-103291, Article 103291 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | [Display omitted]
•Non-identically independently distributed and private electronic medical records.•Distributed clustering of patients into clinically meaningful communities.•Community-based learning leaves all data and computation local on silos.•Performant algorithm of high predictive accuracy and low communication cost.
Electronic medical records (EMRs) support the development of machine learning algorithms for predicting disease incidence, patient response to treatment, and other healthcare events. But so far most algorithms have been centralized, taking little account of the decentralized, non-identically independently distributed (non-IID), and privacy-sensitive characteristics of EMRs that can complicate data collection, sharing and learning. To address this challenge, we introduced a community-based federated machine learning (CBFL) algorithm and evaluated it on non-IID ICU EMRs. Our algorithm clustered the distributed data into clinically meaningful communities that captured similar diagnoses and geographical locations, and learnt one model for each community. Throughout the learning process, the data was kept local at hospitals, while locally-computed results were aggregated on a server. Evaluation results show that CBFL outperformed the baseline federated machine learning (FL) algorithm in terms of Area Under the Receiver Operating Characteristic Curve (ROC AUC), Area Under the Precision-Recall Curve (PR AUC), and communication cost between hospitals and the server. Furthermore, communities’ performance difference could be explained by how dissimilar one community was to others. |
---|---|
ISSN: | 1532-0464 1532-0480 |
DOI: | 10.1016/j.jbi.2019.103291 |