Loading…

Using rule-based natural language processing to improve disease normalization in biomedical text

In order for computers to extract useful information from unstructured text, a concept normalization system is needed to link relevant concepts in a text to sources that contain further information about the concept. Popular concept normalization tools in the biomedical field are dictionary-based. I...

Full description

Saved in:
Bibliographic Details
Published in:Journal of the American Medical Informatics Association : JAMIA 2013-09, Vol.20 (5), p.876-881
Main Authors: Kang, Ning, Singh, Bharat, Afzal, Zubair, van Mulligen, Erik M, Kors, Jan A
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c438t-97143a3455a1be7538ac63d8f337ca3c273f49b522afa092506c4cd745cdc3633
cites cdi_FETCH-LOGICAL-c438t-97143a3455a1be7538ac63d8f337ca3c273f49b522afa092506c4cd745cdc3633
container_end_page 881
container_issue 5
container_start_page 876
container_title Journal of the American Medical Informatics Association : JAMIA
container_volume 20
creator Kang, Ning
Singh, Bharat
Afzal, Zubair
van Mulligen, Erik M
Kors, Jan A
description In order for computers to extract useful information from unstructured text, a concept normalization system is needed to link relevant concepts in a text to sources that contain further information about the concept. Popular concept normalization tools in the biomedical field are dictionary-based. In this study we investigate the usefulness of natural language processing (NLP) as an adjunct to dictionary-based concept normalization. We compared the performance of two biomedical concept normalization systems, MetaMap and Peregrine, on the Arizona Disease Corpus, with and without the use of a rule-based NLP module. Performance was assessed for exact and inexact boundary matching of the system annotations with those of the gold standard and for concept identifier matching. Without the NLP module, MetaMap and Peregrine attained F-scores of 61.0% and 63.9%, respectively, for exact boundary matching, and 55.1% and 56.9% for concept identifier matching. With the aid of the NLP module, the F-scores of MetaMap and Peregrine improved to 73.3% and 78.0% for boundary matching, and to 66.2% and 69.8% for concept identifier matching. For inexact boundary matching, performances further increased to 85.5% and 85.4%, and to 73.6% and 73.3% for concept identifier matching. We have shown the added value of NLP for the recognition and normalization of diseases with MetaMap and Peregrine. The NLP module is general and can be applied in combination with any concept normalization system. Whether its use for concept types other than disease is equally advantageous remains to be investigated.
doi_str_mv 10.1136/amiajnl-2012-001173
format article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_3756254</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1420166465</sourcerecordid><originalsourceid>FETCH-LOGICAL-c438t-97143a3455a1be7538ac63d8f337ca3c273f49b522afa092506c4cd745cdc3633</originalsourceid><addsrcrecordid>eNqFkU1PHSEUhkmjqdb2FzQxLN2MBQ4f926aGGOriUk3mrjDMwxzi5mBK8yY6q8vem9NXbkCwnNezuEh5Ctnx5yD_oZjwLs4NIJx0TDGuYEPZJ8rYZqlkTc7dc-0aRQTZo98KuWuMlqA-kj2BDAJXMh9cntdQlzRPA--abH4jkac5owDHTCuZlx5us7J-fKCTYmGsZ4fPO1C8ZWnMeURh_CEU0iRhkjbkEbfBVcjJv9n-kx2exyK_7JdD8j1j7Or0_Pm8tfPi9OTy8ZJWEy1Yy4BQSqFvPVGwQKdhm7RAxiH4ISBXi5bJQT2yJZCMe2k64xUrnOgAQ7I903uem7r-87HqU5h1zmMmB9twmDf3sTw267SgwWjtFCyBhxtA3K6n32Z7BiK80P9B5_mYrlSXAMDZt5HZXWitdSqorBBXU6lZN-_dsSZfdZotxrts0a70VirDv8f5rXmnzf4CzfVnJA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1420166465</pqid></control><display><type>article</type><title>Using rule-based natural language processing to improve disease normalization in biomedical text</title><source>Oxford Journals Online</source><source>PubMed Central</source><creator>Kang, Ning ; Singh, Bharat ; Afzal, Zubair ; van Mulligen, Erik M ; Kors, Jan A</creator><creatorcontrib>Kang, Ning ; Singh, Bharat ; Afzal, Zubair ; van Mulligen, Erik M ; Kors, Jan A</creatorcontrib><description>In order for computers to extract useful information from unstructured text, a concept normalization system is needed to link relevant concepts in a text to sources that contain further information about the concept. Popular concept normalization tools in the biomedical field are dictionary-based. In this study we investigate the usefulness of natural language processing (NLP) as an adjunct to dictionary-based concept normalization. We compared the performance of two biomedical concept normalization systems, MetaMap and Peregrine, on the Arizona Disease Corpus, with and without the use of a rule-based NLP module. Performance was assessed for exact and inexact boundary matching of the system annotations with those of the gold standard and for concept identifier matching. Without the NLP module, MetaMap and Peregrine attained F-scores of 61.0% and 63.9%, respectively, for exact boundary matching, and 55.1% and 56.9% for concept identifier matching. With the aid of the NLP module, the F-scores of MetaMap and Peregrine improved to 73.3% and 78.0% for boundary matching, and to 66.2% and 69.8% for concept identifier matching. For inexact boundary matching, performances further increased to 85.5% and 85.4%, and to 73.6% and 73.3% for concept identifier matching. We have shown the added value of NLP for the recognition and normalization of diseases with MetaMap and Peregrine. The NLP module is general and can be applied in combination with any concept normalization system. Whether its use for concept types other than disease is equally advantageous remains to be investigated.</description><identifier>ISSN: 1067-5027</identifier><identifier>EISSN: 1527-974X</identifier><identifier>DOI: 10.1136/amiajnl-2012-001173</identifier><identifier>PMID: 23043124</identifier><language>eng</language><publisher>England: BMJ Publishing Group</publisher><subject>Disease - classification ; Humans ; Information Storage and Retrieval - methods ; Natural Language Processing ; Research and Applications ; Terminology as Topic ; Unified Medical Language System ; Vocabulary, Controlled</subject><ispartof>Journal of the American Medical Informatics Association : JAMIA, 2013-09, Vol.20 (5), p.876-881</ispartof><rights>Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions 2013</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c438t-97143a3455a1be7538ac63d8f337ca3c273f49b522afa092506c4cd745cdc3633</citedby><cites>FETCH-LOGICAL-c438t-97143a3455a1be7538ac63d8f337ca3c273f49b522afa092506c4cd745cdc3633</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3756254/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3756254/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,27924,27925,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/23043124$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Kang, Ning</creatorcontrib><creatorcontrib>Singh, Bharat</creatorcontrib><creatorcontrib>Afzal, Zubair</creatorcontrib><creatorcontrib>van Mulligen, Erik M</creatorcontrib><creatorcontrib>Kors, Jan A</creatorcontrib><title>Using rule-based natural language processing to improve disease normalization in biomedical text</title><title>Journal of the American Medical Informatics Association : JAMIA</title><addtitle>J Am Med Inform Assoc</addtitle><description>In order for computers to extract useful information from unstructured text, a concept normalization system is needed to link relevant concepts in a text to sources that contain further information about the concept. Popular concept normalization tools in the biomedical field are dictionary-based. In this study we investigate the usefulness of natural language processing (NLP) as an adjunct to dictionary-based concept normalization. We compared the performance of two biomedical concept normalization systems, MetaMap and Peregrine, on the Arizona Disease Corpus, with and without the use of a rule-based NLP module. Performance was assessed for exact and inexact boundary matching of the system annotations with those of the gold standard and for concept identifier matching. Without the NLP module, MetaMap and Peregrine attained F-scores of 61.0% and 63.9%, respectively, for exact boundary matching, and 55.1% and 56.9% for concept identifier matching. With the aid of the NLP module, the F-scores of MetaMap and Peregrine improved to 73.3% and 78.0% for boundary matching, and to 66.2% and 69.8% for concept identifier matching. For inexact boundary matching, performances further increased to 85.5% and 85.4%, and to 73.6% and 73.3% for concept identifier matching. We have shown the added value of NLP for the recognition and normalization of diseases with MetaMap and Peregrine. The NLP module is general and can be applied in combination with any concept normalization system. Whether its use for concept types other than disease is equally advantageous remains to be investigated.</description><subject>Disease - classification</subject><subject>Humans</subject><subject>Information Storage and Retrieval - methods</subject><subject>Natural Language Processing</subject><subject>Research and Applications</subject><subject>Terminology as Topic</subject><subject>Unified Medical Language System</subject><subject>Vocabulary, Controlled</subject><issn>1067-5027</issn><issn>1527-974X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><recordid>eNqFkU1PHSEUhkmjqdb2FzQxLN2MBQ4f926aGGOriUk3mrjDMwxzi5mBK8yY6q8vem9NXbkCwnNezuEh5Ctnx5yD_oZjwLs4NIJx0TDGuYEPZJ8rYZqlkTc7dc-0aRQTZo98KuWuMlqA-kj2BDAJXMh9cntdQlzRPA--abH4jkac5owDHTCuZlx5us7J-fKCTYmGsZ4fPO1C8ZWnMeURh_CEU0iRhkjbkEbfBVcjJv9n-kx2exyK_7JdD8j1j7Or0_Pm8tfPi9OTy8ZJWEy1Yy4BQSqFvPVGwQKdhm7RAxiH4ISBXi5bJQT2yJZCMe2k64xUrnOgAQ7I903uem7r-87HqU5h1zmMmB9twmDf3sTw267SgwWjtFCyBhxtA3K6n32Z7BiK80P9B5_mYrlSXAMDZt5HZXWitdSqorBBXU6lZN-_dsSZfdZotxrts0a70VirDv8f5rXmnzf4CzfVnJA</recordid><startdate>20130901</startdate><enddate>20130901</enddate><creator>Kang, Ning</creator><creator>Singh, Bharat</creator><creator>Afzal, Zubair</creator><creator>van Mulligen, Erik M</creator><creator>Kors, Jan A</creator><general>BMJ Publishing Group</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>7QO</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope><scope>5PM</scope></search><sort><creationdate>20130901</creationdate><title>Using rule-based natural language processing to improve disease normalization in biomedical text</title><author>Kang, Ning ; Singh, Bharat ; Afzal, Zubair ; van Mulligen, Erik M ; Kors, Jan A</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c438t-97143a3455a1be7538ac63d8f337ca3c273f49b522afa092506c4cd745cdc3633</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Disease - classification</topic><topic>Humans</topic><topic>Information Storage and Retrieval - methods</topic><topic>Natural Language Processing</topic><topic>Research and Applications</topic><topic>Terminology as Topic</topic><topic>Unified Medical Language System</topic><topic>Vocabulary, Controlled</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kang, Ning</creatorcontrib><creatorcontrib>Singh, Bharat</creatorcontrib><creatorcontrib>Afzal, Zubair</creatorcontrib><creatorcontrib>van Mulligen, Erik M</creatorcontrib><creatorcontrib>Kors, Jan A</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>Biotechnology Research Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Journal of the American Medical Informatics Association : JAMIA</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kang, Ning</au><au>Singh, Bharat</au><au>Afzal, Zubair</au><au>van Mulligen, Erik M</au><au>Kors, Jan A</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Using rule-based natural language processing to improve disease normalization in biomedical text</atitle><jtitle>Journal of the American Medical Informatics Association : JAMIA</jtitle><addtitle>J Am Med Inform Assoc</addtitle><date>2013-09-01</date><risdate>2013</risdate><volume>20</volume><issue>5</issue><spage>876</spage><epage>881</epage><pages>876-881</pages><issn>1067-5027</issn><eissn>1527-974X</eissn><abstract>In order for computers to extract useful information from unstructured text, a concept normalization system is needed to link relevant concepts in a text to sources that contain further information about the concept. Popular concept normalization tools in the biomedical field are dictionary-based. In this study we investigate the usefulness of natural language processing (NLP) as an adjunct to dictionary-based concept normalization. We compared the performance of two biomedical concept normalization systems, MetaMap and Peregrine, on the Arizona Disease Corpus, with and without the use of a rule-based NLP module. Performance was assessed for exact and inexact boundary matching of the system annotations with those of the gold standard and for concept identifier matching. Without the NLP module, MetaMap and Peregrine attained F-scores of 61.0% and 63.9%, respectively, for exact boundary matching, and 55.1% and 56.9% for concept identifier matching. With the aid of the NLP module, the F-scores of MetaMap and Peregrine improved to 73.3% and 78.0% for boundary matching, and to 66.2% and 69.8% for concept identifier matching. For inexact boundary matching, performances further increased to 85.5% and 85.4%, and to 73.6% and 73.3% for concept identifier matching. We have shown the added value of NLP for the recognition and normalization of diseases with MetaMap and Peregrine. The NLP module is general and can be applied in combination with any concept normalization system. Whether its use for concept types other than disease is equally advantageous remains to be investigated.</abstract><cop>England</cop><pub>BMJ Publishing Group</pub><pmid>23043124</pmid><doi>10.1136/amiajnl-2012-001173</doi><tpages>6</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1067-5027
ispartof Journal of the American Medical Informatics Association : JAMIA, 2013-09, Vol.20 (5), p.876-881
issn 1067-5027
1527-974X
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_3756254
source Oxford Journals Online; PubMed Central
subjects Disease - classification
Humans
Information Storage and Retrieval - methods
Natural Language Processing
Research and Applications
Terminology as Topic
Unified Medical Language System
Vocabulary, Controlled
title Using rule-based natural language processing to improve disease normalization in biomedical text
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T05%3A39%3A49IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Using%20rule-based%20natural%20language%20processing%20to%20improve%20disease%20normalization%20in%20biomedical%20text&rft.jtitle=Journal%20of%20the%20American%20Medical%20Informatics%20Association%20:%20JAMIA&rft.au=Kang,%20Ning&rft.date=2013-09-01&rft.volume=20&rft.issue=5&rft.spage=876&rft.epage=881&rft.pages=876-881&rft.issn=1067-5027&rft.eissn=1527-974X&rft_id=info:doi/10.1136/amiajnl-2012-001173&rft_dat=%3Cproquest_pubme%3E1420166465%3C/proquest_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c438t-97143a3455a1be7538ac63d8f337ca3c273f49b522afa092506c4cd745cdc3633%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1420166465&rft_id=info:pmid/23043124&rfr_iscdi=true