Loading…

Time-sensitive clinical concept embeddings learned from large electronic health records

Learning distributional representation of clinical concepts (e.g., diseases, drugs, and labs) is an important research area of deep learning in the medical domain. However, many existing relevant methods do not consider temporal dependencies along the longitudinal sequence of a patient's record...

Full description

Saved in:
Bibliographic Details
Published in:BMC medical informatics and decision making 2019-04, Vol.19 (Suppl 2), p.58-58, Article 58
Main Authors: Xiang, Yang, Xu, Jun, Si, Yuqi, Li, Zhiheng, Rasmy, Laila, Zhou, Yujia, Tiryaki, Firat, Li, Fang, Zhang, Yaoyun, Wu, Yonghui, Jiang, Xiaoqian, Zheng, Wenjin Jim, Zhi, Degui, Tao, Cui, Xu, Hua
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Learning distributional representation of clinical concepts (e.g., diseases, drugs, and labs) is an important research area of deep learning in the medical domain. However, many existing relevant methods do not consider temporal dependencies along the longitudinal sequence of a patient's records, which may lead to incorrect selection of contexts. To address this issue, we extended three popular concept embedding learning methods: word2vec, positive pointwise mutual information (PPMI) and FastText, to consider time-sensitive information. We then trained them on a large electronic health records (EHR) database containing about 50 million patients to generate concept embeddings and evaluated them for both intrinsic evaluations focusing on concept similarity measure and an extrinsic evaluation to assess the use of generated concept embeddings in the task of predicting disease onset. Our experiments show that embeddings learned from information within one visit (time window zero) improve performance on the concept similarity measure and the FastText algorithm usually had better performance than the other two algorithms. For the predictive modeling task, the optimal result was achieved by word2vec embeddings with a 30-day sliding window. Considering time constraints are important in training clinical concept embeddings. We expect they can benefit a series of downstream applications.
ISSN:1472-6947
1472-6947
DOI:10.1186/s12911-019-0766-3