Loading…

NLP Methods for Extraction of Symptoms from Unstructured Data for Use in Prognostic COVID-19 Analytic Models

Statistical modeling of outcomes based on a patient's presenting symptoms (symptomatology) can help deliver high quality care and allocate essential resources, which is especially important during the COVID-19 pandemic. Patient symptoms are typically found in unstructured notes, and thus not re...

Full description

Saved in:
Bibliographic Details
Published in:The Journal of artificial intelligence research 2021-01, Vol.72, p.429-474
Main Authors: Silverman, Greg M., Sahoo, Himanshu S., Ingraham, Nicholas E., Lupei, Monica, Puskarich, Michael A., Usher, Michael, Dries, James, Finzel, Raymond L., Murray, Eric, Sartori, John, Simon, Gyorgy, Zhang, Rui, Melton, Genevieve B., Tignanelli, Christopher J., Pakhomov, Serguei VS
Format: Article
Language:English
Subjects:
Citations: Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c263t-16c310e85be5a14122527a34fcf77f5c762f35c57bcfe4453102a21084912de63
cites
container_end_page 474
container_issue
container_start_page 429
container_title The Journal of artificial intelligence research
container_volume 72
creator Silverman, Greg M.
Sahoo, Himanshu S.
Ingraham, Nicholas E.
Lupei, Monica
Puskarich, Michael A.
Usher, Michael
Dries, James
Finzel, Raymond L.
Murray, Eric
Sartori, John
Simon, Gyorgy
Zhang, Rui
Melton, Genevieve B.
Tignanelli, Christopher J.
Pakhomov, Serguei VS
description Statistical modeling of outcomes based on a patient's presenting symptoms (symptomatology) can help deliver high quality care and allocate essential resources, which is especially important during the COVID-19 pandemic. Patient symptoms are typically found in unstructured notes, and thus not readily available for clinical decision making. In an attempt to fill this gap, this study compared two methods for symptom extraction from Emergency Department (ED) admission notes. Both methods utilized a lexicon derived by expanding The Center for Disease Control and Prevention's (CDC) Symptoms of Coronavirus list. The first method utilized a word2vec model to expand the lexicon using a dictionary mapping to the Uni ed Medical Language System (UMLS). The second method utilized the expanded lexicon as a rule-based gazetteer and the UMLS. These methods were evaluated against a manually annotated reference (f1-score of 0.87 for UMLS-based ensemble; and 0.85 for rule-based gazetteer with UMLS). Through analyses of associations of extracted symptoms used as features against various outcomes, salient risks among the population of COVID-19 patients, including increased risk of in-hospital mortality (OR 1.85, p-value < 0.001), were identified for patients presenting with dyspnea. Disparities between English and non-English speaking patients were also identified, the most salient being a concerning finding of opposing risk signals between fatigue and in-hospital mortality (non-English: OR 1.95, p-value = 0.02; English: OR 0.63, p-value = 0.01). While use of symptomatology for modeling of outcomes is not unique, unlike previous studies this study showed that models built using symptoms with the outcome of in-hospital mortality were not significantly different from models using data collected during an in-patient encounter (AUC of 0.9 with 95% CI of [0.88, 0.91] using only vital signs; AUC of 0.87 with 95% CI of [0.85, 0.88] using only symptoms). These findings indicate that prognostic models based on symptomatology could aid in extending COVID-19 patient care through telemedicine, replacing the need for in-person options. The methods presented in this study have potential for use in development of symptomatology-based models for other diseases, including for the study of Post-Acute Sequelae of COVID-19 (PASC).
doi_str_mv 10.1613/jair.1.12631
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2590056376</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2590056376</sourcerecordid><originalsourceid>FETCH-LOGICAL-c263t-16c310e85be5a14122527a34fcf77f5c762f35c57bcfe4453102a21084912de63</originalsourceid><addsrcrecordid>eNpNkE1PwzAMhiMEEmNw4wdE4kpHnDTNepw2PiZtbBKMa5SlCXRqm5GkEvv3tBsHTrbsx9arB6FbICPIgD3sVOlHMAKaMThDAyAiS3LBxfm__hJdhbAjBPKUjgeoel2s8dLEL1cEbJ3Hjz_RKx1L12Bn8duh3kdXdyvvarxpQvStjq03BZ6pqI4Xm2Bw2eC1d5-NC7HUeLr6mM8SyPGkUdWhnyxdYapwjS6sqoK5-atDtHl6fJ--JIvV83w6WSS6Sx4TyDQDYsZ8a7iCFCjlVCiWWm2FsFyLjFrGNRdbbU2a8g6migIZpznQwmRsiO5Of_fefbcmRLlzre-yBEl5TgjPmOip-xOlvQvBGyv3vqyVP0ggsvcpe58S5NEn-wXdrGes</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2590056376</pqid></control><display><type>article</type><title>NLP Methods for Extraction of Symptoms from Unstructured Data for Use in Prognostic COVID-19 Analytic Models</title><source>Publicly Available Content Database</source><source>Coronavirus Research Database</source><creator>Silverman, Greg M. ; Sahoo, Himanshu S. ; Ingraham, Nicholas E. ; Lupei, Monica ; Puskarich, Michael A. ; Usher, Michael ; Dries, James ; Finzel, Raymond L. ; Murray, Eric ; Sartori, John ; Simon, Gyorgy ; Zhang, Rui ; Melton, Genevieve B. ; Tignanelli, Christopher J. ; Pakhomov, Serguei VS</creator><creatorcontrib>Silverman, Greg M. ; Sahoo, Himanshu S. ; Ingraham, Nicholas E. ; Lupei, Monica ; Puskarich, Michael A. ; Usher, Michael ; Dries, James ; Finzel, Raymond L. ; Murray, Eric ; Sartori, John ; Simon, Gyorgy ; Zhang, Rui ; Melton, Genevieve B. ; Tignanelli, Christopher J. ; Pakhomov, Serguei VS</creatorcontrib><description>Statistical modeling of outcomes based on a patient's presenting symptoms (symptomatology) can help deliver high quality care and allocate essential resources, which is especially important during the COVID-19 pandemic. Patient symptoms are typically found in unstructured notes, and thus not readily available for clinical decision making. In an attempt to fill this gap, this study compared two methods for symptom extraction from Emergency Department (ED) admission notes. Both methods utilized a lexicon derived by expanding The Center for Disease Control and Prevention's (CDC) Symptoms of Coronavirus list. The first method utilized a word2vec model to expand the lexicon using a dictionary mapping to the Uni ed Medical Language System (UMLS). The second method utilized the expanded lexicon as a rule-based gazetteer and the UMLS. These methods were evaluated against a manually annotated reference (f1-score of 0.87 for UMLS-based ensemble; and 0.85 for rule-based gazetteer with UMLS). Through analyses of associations of extracted symptoms used as features against various outcomes, salient risks among the population of COVID-19 patients, including increased risk of in-hospital mortality (OR 1.85, p-value &lt; 0.001), were identified for patients presenting with dyspnea. Disparities between English and non-English speaking patients were also identified, the most salient being a concerning finding of opposing risk signals between fatigue and in-hospital mortality (non-English: OR 1.95, p-value = 0.02; English: OR 0.63, p-value = 0.01). While use of symptomatology for modeling of outcomes is not unique, unlike previous studies this study showed that models built using symptoms with the outcome of in-hospital mortality were not significantly different from models using data collected during an in-patient encounter (AUC of 0.9 with 95% CI of [0.88, 0.91] using only vital signs; AUC of 0.87 with 95% CI of [0.85, 0.88] using only symptoms). These findings indicate that prognostic models based on symptomatology could aid in extending COVID-19 patient care through telemedicine, replacing the need for in-person options. The methods presented in this study have potential for use in development of symptomatology-based models for other diseases, including for the study of Post-Acute Sequelae of COVID-19 (PASC).</description><identifier>ISSN: 1076-9757</identifier><identifier>EISSN: 1076-9757</identifier><identifier>EISSN: 1943-5037</identifier><identifier>DOI: 10.1613/jair.1.12631</identifier><language>eng</language><publisher>San Francisco: AI Access Foundation</publisher><subject>Artificial intelligence ; Coronaviruses ; COVID-19 ; Decision making ; Disease control ; Dyspnea ; Emergency medical services ; Feature extraction ; Hospitals ; Mortality ; Patients ; Signs and symptoms ; Statistical models ; Unstructured data ; Viral diseases</subject><ispartof>The Journal of artificial intelligence research, 2021-01, Vol.72, p.429-474</ispartof><rights>2021. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the associated terms available at https://www.jair.org/index.php/jair/about</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c263t-16c310e85be5a14122527a34fcf77f5c762f35c57bcfe4453102a21084912de63</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2590056376?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,25753,27924,27925,37012,38516,43895,44590</link.rule.ids></links><search><creatorcontrib>Silverman, Greg M.</creatorcontrib><creatorcontrib>Sahoo, Himanshu S.</creatorcontrib><creatorcontrib>Ingraham, Nicholas E.</creatorcontrib><creatorcontrib>Lupei, Monica</creatorcontrib><creatorcontrib>Puskarich, Michael A.</creatorcontrib><creatorcontrib>Usher, Michael</creatorcontrib><creatorcontrib>Dries, James</creatorcontrib><creatorcontrib>Finzel, Raymond L.</creatorcontrib><creatorcontrib>Murray, Eric</creatorcontrib><creatorcontrib>Sartori, John</creatorcontrib><creatorcontrib>Simon, Gyorgy</creatorcontrib><creatorcontrib>Zhang, Rui</creatorcontrib><creatorcontrib>Melton, Genevieve B.</creatorcontrib><creatorcontrib>Tignanelli, Christopher J.</creatorcontrib><creatorcontrib>Pakhomov, Serguei VS</creatorcontrib><title>NLP Methods for Extraction of Symptoms from Unstructured Data for Use in Prognostic COVID-19 Analytic Models</title><title>The Journal of artificial intelligence research</title><description>Statistical modeling of outcomes based on a patient's presenting symptoms (symptomatology) can help deliver high quality care and allocate essential resources, which is especially important during the COVID-19 pandemic. Patient symptoms are typically found in unstructured notes, and thus not readily available for clinical decision making. In an attempt to fill this gap, this study compared two methods for symptom extraction from Emergency Department (ED) admission notes. Both methods utilized a lexicon derived by expanding The Center for Disease Control and Prevention's (CDC) Symptoms of Coronavirus list. The first method utilized a word2vec model to expand the lexicon using a dictionary mapping to the Uni ed Medical Language System (UMLS). The second method utilized the expanded lexicon as a rule-based gazetteer and the UMLS. These methods were evaluated against a manually annotated reference (f1-score of 0.87 for UMLS-based ensemble; and 0.85 for rule-based gazetteer with UMLS). Through analyses of associations of extracted symptoms used as features against various outcomes, salient risks among the population of COVID-19 patients, including increased risk of in-hospital mortality (OR 1.85, p-value &lt; 0.001), were identified for patients presenting with dyspnea. Disparities between English and non-English speaking patients were also identified, the most salient being a concerning finding of opposing risk signals between fatigue and in-hospital mortality (non-English: OR 1.95, p-value = 0.02; English: OR 0.63, p-value = 0.01). While use of symptomatology for modeling of outcomes is not unique, unlike previous studies this study showed that models built using symptoms with the outcome of in-hospital mortality were not significantly different from models using data collected during an in-patient encounter (AUC of 0.9 with 95% CI of [0.88, 0.91] using only vital signs; AUC of 0.87 with 95% CI of [0.85, 0.88] using only symptoms). These findings indicate that prognostic models based on symptomatology could aid in extending COVID-19 patient care through telemedicine, replacing the need for in-person options. The methods presented in this study have potential for use in development of symptomatology-based models for other diseases, including for the study of Post-Acute Sequelae of COVID-19 (PASC).</description><subject>Artificial intelligence</subject><subject>Coronaviruses</subject><subject>COVID-19</subject><subject>Decision making</subject><subject>Disease control</subject><subject>Dyspnea</subject><subject>Emergency medical services</subject><subject>Feature extraction</subject><subject>Hospitals</subject><subject>Mortality</subject><subject>Patients</subject><subject>Signs and symptoms</subject><subject>Statistical models</subject><subject>Unstructured data</subject><subject>Viral diseases</subject><issn>1076-9757</issn><issn>1076-9757</issn><issn>1943-5037</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>COVID</sourceid><sourceid>PIMPY</sourceid><recordid>eNpNkE1PwzAMhiMEEmNw4wdE4kpHnDTNepw2PiZtbBKMa5SlCXRqm5GkEvv3tBsHTrbsx9arB6FbICPIgD3sVOlHMAKaMThDAyAiS3LBxfm__hJdhbAjBPKUjgeoel2s8dLEL1cEbJ3Hjz_RKx1L12Bn8duh3kdXdyvvarxpQvStjq03BZ6pqI4Xm2Bw2eC1d5-NC7HUeLr6mM8SyPGkUdWhnyxdYapwjS6sqoK5-atDtHl6fJ--JIvV83w6WSS6Sx4TyDQDYsZ8a7iCFCjlVCiWWm2FsFyLjFrGNRdbbU2a8g6migIZpznQwmRsiO5Of_fefbcmRLlzre-yBEl5TgjPmOip-xOlvQvBGyv3vqyVP0ggsvcpe58S5NEn-wXdrGes</recordid><startdate>20210101</startdate><enddate>20210101</enddate><creator>Silverman, Greg M.</creator><creator>Sahoo, Himanshu S.</creator><creator>Ingraham, Nicholas E.</creator><creator>Lupei, Monica</creator><creator>Puskarich, Michael A.</creator><creator>Usher, Michael</creator><creator>Dries, James</creator><creator>Finzel, Raymond L.</creator><creator>Murray, Eric</creator><creator>Sartori, John</creator><creator>Simon, Gyorgy</creator><creator>Zhang, Rui</creator><creator>Melton, Genevieve B.</creator><creator>Tignanelli, Christopher J.</creator><creator>Pakhomov, Serguei VS</creator><general>AI Access Foundation</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>COVID</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope></search><sort><creationdate>20210101</creationdate><title>NLP Methods for Extraction of Symptoms from Unstructured Data for Use in Prognostic COVID-19 Analytic Models</title><author>Silverman, Greg M. ; Sahoo, Himanshu S. ; Ingraham, Nicholas E. ; Lupei, Monica ; Puskarich, Michael A. ; Usher, Michael ; Dries, James ; Finzel, Raymond L. ; Murray, Eric ; Sartori, John ; Simon, Gyorgy ; Zhang, Rui ; Melton, Genevieve B. ; Tignanelli, Christopher J. ; Pakhomov, Serguei VS</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c263t-16c310e85be5a14122527a34fcf77f5c762f35c57bcfe4453102a21084912de63</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Artificial intelligence</topic><topic>Coronaviruses</topic><topic>COVID-19</topic><topic>Decision making</topic><topic>Disease control</topic><topic>Dyspnea</topic><topic>Emergency medical services</topic><topic>Feature extraction</topic><topic>Hospitals</topic><topic>Mortality</topic><topic>Patients</topic><topic>Signs and symptoms</topic><topic>Statistical models</topic><topic>Unstructured data</topic><topic>Viral diseases</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Silverman, Greg M.</creatorcontrib><creatorcontrib>Sahoo, Himanshu S.</creatorcontrib><creatorcontrib>Ingraham, Nicholas E.</creatorcontrib><creatorcontrib>Lupei, Monica</creatorcontrib><creatorcontrib>Puskarich, Michael A.</creatorcontrib><creatorcontrib>Usher, Michael</creatorcontrib><creatorcontrib>Dries, James</creatorcontrib><creatorcontrib>Finzel, Raymond L.</creatorcontrib><creatorcontrib>Murray, Eric</creatorcontrib><creatorcontrib>Sartori, John</creatorcontrib><creatorcontrib>Simon, Gyorgy</creatorcontrib><creatorcontrib>Zhang, Rui</creatorcontrib><creatorcontrib>Melton, Genevieve B.</creatorcontrib><creatorcontrib>Tignanelli, Christopher J.</creatorcontrib><creatorcontrib>Pakhomov, Serguei VS</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies &amp; Aerospace Database‎ (1962 - current)</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>Coronavirus Research Database</collection><collection>ProQuest Central</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer science database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>The Journal of artificial intelligence research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Silverman, Greg M.</au><au>Sahoo, Himanshu S.</au><au>Ingraham, Nicholas E.</au><au>Lupei, Monica</au><au>Puskarich, Michael A.</au><au>Usher, Michael</au><au>Dries, James</au><au>Finzel, Raymond L.</au><au>Murray, Eric</au><au>Sartori, John</au><au>Simon, Gyorgy</au><au>Zhang, Rui</au><au>Melton, Genevieve B.</au><au>Tignanelli, Christopher J.</au><au>Pakhomov, Serguei VS</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>NLP Methods for Extraction of Symptoms from Unstructured Data for Use in Prognostic COVID-19 Analytic Models</atitle><jtitle>The Journal of artificial intelligence research</jtitle><date>2021-01-01</date><risdate>2021</risdate><volume>72</volume><spage>429</spage><epage>474</epage><pages>429-474</pages><issn>1076-9757</issn><eissn>1076-9757</eissn><eissn>1943-5037</eissn><abstract>Statistical modeling of outcomes based on a patient's presenting symptoms (symptomatology) can help deliver high quality care and allocate essential resources, which is especially important during the COVID-19 pandemic. Patient symptoms are typically found in unstructured notes, and thus not readily available for clinical decision making. In an attempt to fill this gap, this study compared two methods for symptom extraction from Emergency Department (ED) admission notes. Both methods utilized a lexicon derived by expanding The Center for Disease Control and Prevention's (CDC) Symptoms of Coronavirus list. The first method utilized a word2vec model to expand the lexicon using a dictionary mapping to the Uni ed Medical Language System (UMLS). The second method utilized the expanded lexicon as a rule-based gazetteer and the UMLS. These methods were evaluated against a manually annotated reference (f1-score of 0.87 for UMLS-based ensemble; and 0.85 for rule-based gazetteer with UMLS). Through analyses of associations of extracted symptoms used as features against various outcomes, salient risks among the population of COVID-19 patients, including increased risk of in-hospital mortality (OR 1.85, p-value &lt; 0.001), were identified for patients presenting with dyspnea. Disparities between English and non-English speaking patients were also identified, the most salient being a concerning finding of opposing risk signals between fatigue and in-hospital mortality (non-English: OR 1.95, p-value = 0.02; English: OR 0.63, p-value = 0.01). While use of symptomatology for modeling of outcomes is not unique, unlike previous studies this study showed that models built using symptoms with the outcome of in-hospital mortality were not significantly different from models using data collected during an in-patient encounter (AUC of 0.9 with 95% CI of [0.88, 0.91] using only vital signs; AUC of 0.87 with 95% CI of [0.85, 0.88] using only symptoms). These findings indicate that prognostic models based on symptomatology could aid in extending COVID-19 patient care through telemedicine, replacing the need for in-person options. The methods presented in this study have potential for use in development of symptomatology-based models for other diseases, including for the study of Post-Acute Sequelae of COVID-19 (PASC).</abstract><cop>San Francisco</cop><pub>AI Access Foundation</pub><doi>10.1613/jair.1.12631</doi><tpages>46</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1076-9757
ispartof The Journal of artificial intelligence research, 2021-01, Vol.72, p.429-474
issn 1076-9757
1076-9757
1943-5037
language eng
recordid cdi_proquest_journals_2590056376
source Publicly Available Content Database; Coronavirus Research Database
subjects Artificial intelligence
Coronaviruses
COVID-19
Decision making
Disease control
Dyspnea
Emergency medical services
Feature extraction
Hospitals
Mortality
Patients
Signs and symptoms
Statistical models
Unstructured data
Viral diseases
title NLP Methods for Extraction of Symptoms from Unstructured Data for Use in Prognostic COVID-19 Analytic Models
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-20T22%3A24%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=NLP%20Methods%20for%20Extraction%20of%20Symptoms%20from%20Unstructured%20Data%20for%20Use%20in%20Prognostic%20COVID-19%20Analytic%20Models&rft.jtitle=The%20Journal%20of%20artificial%20intelligence%20research&rft.au=Silverman,%20Greg%20M.&rft.date=2021-01-01&rft.volume=72&rft.spage=429&rft.epage=474&rft.pages=429-474&rft.issn=1076-9757&rft.eissn=1076-9757&rft_id=info:doi/10.1613/jair.1.12631&rft_dat=%3Cproquest_cross%3E2590056376%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c263t-16c310e85be5a14122527a34fcf77f5c762f35c57bcfe4453102a21084912de63%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2590056376&rft_id=info:pmid/&rfr_iscdi=true