Loading…
Multi-layer framework of identifying placenta related research towards Placenta Curated Research Dataset (PCRD) development for the PAT project
[Display omitted] •A novel multi-layer framework to identify placenta research from PubMed systematically.•A Naïve Bayes text classifier developed with MeSH terms as features.•NLP components and human judgement seamlessly integrated for text classification. The placenta is a maternal-fetal organ tha...
Saved in:
Published in: | Journal of biomedical informatics 2019-06, Vol.94, p.103191-103191, Article 103191 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | [Display omitted]
•A novel multi-layer framework to identify placenta research from PubMed systematically.•A Naïve Bayes text classifier developed with MeSH terms as features.•NLP components and human judgement seamlessly integrated for text classification.
The placenta is a maternal-fetal organ that develops during pregnancy and provides nutrients, oxygen, and removal of waste products to the growing fetus. Better understanding of the placenta promises to help improve health of mothers and children, given its influence on health lasting a lifetime. However, the placenta is poorly understood due to its variability across different species and no live functions available after the baby is delivered. The Placenta Atlas Tool (PAT) project aims to leverage advanced computational approaches to meld numerous and diverse datasets into an integrated resource to encourage a “systems biology” approach for study of both normal and abnormal placental development and function throughout gestation.
In this study, we introduced a multi-layer framework to automatically identify PAT relevant research from PubMed and develop a Placenta Curated Research Dataset (PCRD) to ultimately support placenta research. This framework functions by multiple well-known Natural Language Processing (NLP) components; including Medical Subject Headings (MeSH) based Naïve Bayes classifier, abstract based text similarity comparison and MeSH based article prioritization to systematically filter out PAT relevant research publications for further data curation. In addition, we developed a user-friendly web application to incorporate human judgement at the final stage of publication identification.
We obtained 22,047 articles from PubMed, and programmatically identified 6086 articles that are highly relevant to PAT via our framework. To assess performance of the framework, we manually reviewed a random set of articles by using our web tool. Based on our review, accuracy of article classification is greater than 90% and accuracy of prioritization is greater than 80%.
We developed a multi-layer publication identification framework to systematically identify PAT relevant publications from PubMed. This framework not only demonstrates good performance in identifying placenta related research, but also can be easily extended to support research in other scientific fields. |
---|---|
ISSN: | 1532-0464 1532-0480 |
DOI: | 10.1016/j.jbi.2019.103191 |