Loading…
Epidemiologic information discovery from open-access COVID-19 case reports via pretrained language model
Although open-access data are increasingly common and useful to epidemiological research, the curation of such datasets is resource-intensive and time-consuming. Despite the existence of a major source of COVID-19 data, the regularly disclosed case reports were often written in natural language with...
Saved in:
Published in: | iScience 2022-10, Vol.25 (10), p.105079, Article 105079 |
---|---|
Main Authors: | , , , , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Although open-access data are increasingly common and useful to epidemiological research, the curation of such datasets is resource-intensive and time-consuming. Despite the existence of a major source of COVID-19 data, the regularly disclosed case reports were often written in natural language with an unstructured format. Here, we propose a computational framework that can automatically extract epidemiological information from open-access COVID-19 case reports. We develop this framework by coupling a language model developed using deep neural networks with training samples compiled using an optimized data annotation strategy. When applied to the COVID-19 case reports collected from mainland China, our framework outperforms all other state-of-the-art deep learning models. The information extracted from our approach is highly consistent with that obtained from the gold-standard manual coding, with a matching rate of 80%. To disseminate our algorithm, we provide an open-access online platform that is able to estimate key epidemiological statistics in real time, with much less effort for data curation.
[Display omitted]
•We propose a method to obtain epidemiological information from COVID-19 case reports•The extracted information has 80% matching rate with the gold-standard manual coding•We provide an online platform that can analyze epidemiological statistics in real time
Health sciences; Virology ; Artificial intelligence; Machine learning; |
---|---|
ISSN: | 2589-0042 2589-0042 |
DOI: | 10.1016/j.isci.2022.105079 |