Loading…
Preliminary Analysis of the Risk Factor Identification Embedding Model for Cardiovascular Disease
Cardiovascular Disease (CVD) is responsible for a large part of healthcare costs every year, but susceptibility to it is affected by complex biological and physiological variables including patients' genetics and lifestyles. There has not been much work to develop a framework that incorporates...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Cardiovascular Disease (CVD) is responsible for a large part of healthcare costs every year, but susceptibility to it is affected by complex biological and physiological variables including patients' genetics and lifestyles. There has not been much work to develop a framework that incorporates these important and clinically relevant risk factors into a comprehensive model for CVD research. Moreover, the data labeling required to do so, such as annotating gene functions, is an extremely challenging, tedious, and time-consuming process. In this work, our goal was to develop and validate a risk factor embedding model, which incorporates genotype, phenotype without pre-labeled information to identify various risk factors of CVD. We hypothesize that (1) the knowledge background that does not require data labeling could be gathered from published abstract data, (2) the phenotype, genotype risk factors could be represented in an embedding vector space. We collected 1,363,682 published abstracts from PubMed using the keyword "heart" and 19,264 human gene names, then trained our model using the collected abstracts. We evaluated our CVD risk factor identification model using both intrinsic and extrinsic evaluations: for the intrinsic evaluation, we examined whether or not the captured top-10 words and genes have references related to the input query "myocardial infarction", as one of CVDs, and our model correctly identified them. For the extrinsic evaluation, we used our model to the dimensionality reduction task for classifications, and our method outperformed other popular methods. These results show the feasibility of our approach for disease-associated risk factors of CVD which incorporates genotype, phenotype.Clinical Relevance-Our model provides a comprehensive tool to incorporate various risk factors without any a priori data labeling knowledge for CVD. Our approach shows a potential to provide discovered knowledge that contributes to better understanding and treatment of CVD. |
---|---|
ISSN: | 2694-0604 |
DOI: | 10.1109/EMBC46164.2021.9630039 |