Loading…

The CHEMDNER corpus of chemicals and drugs and its annotation principles

The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large cor...

Full description

Saved in:
Bibliographic Details
Published in:Journal of cheminformatics 2015, Vol.7 (Suppl 1), p.S2-S2, Article S2
Main Authors: Krallinger, Martin, Rabal, Obdulia, Leitner, Florian, Vazquez, Miguel, Salgado, David, Lu, Zhiyong, Leaman, Robert, Lu, Yanan, Ji, Donghong, Lowe, Daniel M, Sayle, Roger A, Batista-Navarro, Riza Theresa, Rak, Rafal, Huber, Torsten, Rocktäschel, Tim, Matos, Sérgio, Campos, David, Tang, Buzhou, Xu, Hua, Munkhdalai, Tsendsuren, Ryu, Keun Ho, Ramanan, SV, Nathan, Senthil, Žitnik, Slavko, Bajec, Marko, Weber, Lutz, Irmer, Matthias, Akhondi, Saber A, Kors, Jan A, Xu, Shuo, An, Xin, Sikdar, Utpal Kumar, Ekbal, Asif, Yoshioka, Masaharu, Dieb, Thaer M, Choi, Miji, Verspoor, Karin, Khabsa, Madian, Giles, C Lee, Liu, Hongfang, Ravikumar, Komandur Elayavilli, Lamurias, Andre, Couto, Francisco M, Dai, Hong-Jie, Tsai, Richard Tzong-Han, Ata, Caglar, Can, Tolga, Usié, Anabel, Alves, Rui, Segura-Bedmar, Isabel, Martínez, Paloma, Oyarzabal, Julen, Valencia, Alfonso
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c531t-7a2cddfd2ca25de63e58f535ae7927c99899e6fc1a845d219567d181bf1c68c03
cites cdi_FETCH-LOGICAL-c531t-7a2cddfd2ca25de63e58f535ae7927c99899e6fc1a845d219567d181bf1c68c03
container_end_page S2
container_issue Suppl 1
container_start_page S2
container_title Journal of cheminformatics
container_volume 7
creator Krallinger, Martin
Rabal, Obdulia
Leitner, Florian
Vazquez, Miguel
Salgado, David
Lu, Zhiyong
Leaman, Robert
Lu, Yanan
Ji, Donghong
Lowe, Daniel M
Sayle, Roger A
Batista-Navarro, Riza Theresa
Rak, Rafal
Huber, Torsten
Rocktäschel, Tim
Matos, Sérgio
Campos, David
Tang, Buzhou
Xu, Hua
Munkhdalai, Tsendsuren
Ryu, Keun Ho
Ramanan, SV
Nathan, Senthil
Žitnik, Slavko
Bajec, Marko
Weber, Lutz
Irmer, Matthias
Akhondi, Saber A
Kors, Jan A
Xu, Shuo
An, Xin
Sikdar, Utpal Kumar
Ekbal, Asif
Yoshioka, Masaharu
Dieb, Thaer M
Choi, Miji
Verspoor, Karin
Khabsa, Madian
Giles, C Lee
Liu, Hongfang
Ravikumar, Komandur Elayavilli
Lamurias, Andre
Couto, Francisco M
Dai, Hong-Jie
Tsai, Richard Tzong-Han
Ata, Caglar
Can, Tolga
Usié, Anabel
Alves, Rui
Segura-Bedmar, Isabel
Martínez, Paloma
Oyarzabal, Julen
Valencia, Alfonso
description The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus/
doi_str_mv 10.1186/1758-2946-7-S1-S2
format article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_4331692</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>4312328661</sourcerecordid><originalsourceid>FETCH-LOGICAL-c531t-7a2cddfd2ca25de63e58f535ae7927c99899e6fc1a845d219567d181bf1c68c03</originalsourceid><addsrcrecordid>eNqNkU1LxDAQhoMofv8AL1Lw4qWaSZuviyDr6gp-gKvnEJN0t9Jt1qQV_Pe2rC6roHiaYfLMO5l5EToAfAIg2ClwKlIic5bydAzpmKyh7WVtfSXfQjsxvmDMKMd8E20RKgBznm2j0ePUJYPR8PbibviQGB_mbUx8kZipm5VGVzHRtU1saCeLrGz6WPtGN6Wvk3koa1POKxf30EbR4W7_M-6ip8vh42CU3txfXQ_Ob1JDM2hSromxtrDEaEKtY5mjoqAZ1Y5Lwo2UQkrHCgNa5NQSkJRxCwKeCzBMGJztorOF7rx9njlrXN0EXanuIzMd3pXXpfr-UpdTNfFvKs8yYJJ0AsefAsG_ti42alZG46pK1863UQFjPKNYYvYfFOdYCuAdevQDffFtqLtLqM4pSijmUvxJsZwJLrDsx8KCMsHHGFyx3A6w6o1XvbGqN1ZxNQY17rc6XD3LsuPL6Q4gCyD2nk1cWBn9q-oHwlq3bA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1646878096</pqid></control><display><type>article</type><title>The CHEMDNER corpus of chemicals and drugs and its annotation principles</title><source>Springer Nature - SpringerLink Journals - Fully Open Access </source><source>ProQuest - Publicly Available Content Database</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><creator>Krallinger, Martin ; Rabal, Obdulia ; Leitner, Florian ; Vazquez, Miguel ; Salgado, David ; Lu, Zhiyong ; Leaman, Robert ; Lu, Yanan ; Ji, Donghong ; Lowe, Daniel M ; Sayle, Roger A ; Batista-Navarro, Riza Theresa ; Rak, Rafal ; Huber, Torsten ; Rocktäschel, Tim ; Matos, Sérgio ; Campos, David ; Tang, Buzhou ; Xu, Hua ; Munkhdalai, Tsendsuren ; Ryu, Keun Ho ; Ramanan, SV ; Nathan, Senthil ; Žitnik, Slavko ; Bajec, Marko ; Weber, Lutz ; Irmer, Matthias ; Akhondi, Saber A ; Kors, Jan A ; Xu, Shuo ; An, Xin ; Sikdar, Utpal Kumar ; Ekbal, Asif ; Yoshioka, Masaharu ; Dieb, Thaer M ; Choi, Miji ; Verspoor, Karin ; Khabsa, Madian ; Giles, C Lee ; Liu, Hongfang ; Ravikumar, Komandur Elayavilli ; Lamurias, Andre ; Couto, Francisco M ; Dai, Hong-Jie ; Tsai, Richard Tzong-Han ; Ata, Caglar ; Can, Tolga ; Usié, Anabel ; Alves, Rui ; Segura-Bedmar, Isabel ; Martínez, Paloma ; Oyarzabal, Julen ; Valencia, Alfonso</creator><creatorcontrib>Krallinger, Martin ; Rabal, Obdulia ; Leitner, Florian ; Vazquez, Miguel ; Salgado, David ; Lu, Zhiyong ; Leaman, Robert ; Lu, Yanan ; Ji, Donghong ; Lowe, Daniel M ; Sayle, Roger A ; Batista-Navarro, Riza Theresa ; Rak, Rafal ; Huber, Torsten ; Rocktäschel, Tim ; Matos, Sérgio ; Campos, David ; Tang, Buzhou ; Xu, Hua ; Munkhdalai, Tsendsuren ; Ryu, Keun Ho ; Ramanan, SV ; Nathan, Senthil ; Žitnik, Slavko ; Bajec, Marko ; Weber, Lutz ; Irmer, Matthias ; Akhondi, Saber A ; Kors, Jan A ; Xu, Shuo ; An, Xin ; Sikdar, Utpal Kumar ; Ekbal, Asif ; Yoshioka, Masaharu ; Dieb, Thaer M ; Choi, Miji ; Verspoor, Karin ; Khabsa, Madian ; Giles, C Lee ; Liu, Hongfang ; Ravikumar, Komandur Elayavilli ; Lamurias, Andre ; Couto, Francisco M ; Dai, Hong-Jie ; Tsai, Richard Tzong-Han ; Ata, Caglar ; Can, Tolga ; Usié, Anabel ; Alves, Rui ; Segura-Bedmar, Isabel ; Martínez, Paloma ; Oyarzabal, Julen ; Valencia, Alfonso</creatorcontrib><description>The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus/</description><identifier>ISSN: 1758-2946</identifier><identifier>EISSN: 1758-2946</identifier><identifier>DOI: 10.1186/1758-2946-7-S1-S2</identifier><identifier>PMID: 25810773</identifier><language>eng</language><publisher>Cham: Springer International Publishing</publisher><subject>Chemistry ; Chemistry and Materials Science ; Computational Biology/Bioinformatics ; Computer Applications in Chemistry ; Documentation and Information in Chemistry ; Theoretical and Computational Chemistry</subject><ispartof>Journal of cheminformatics, 2015, Vol.7 (Suppl 1), p.S2-S2, Article S2</ispartof><rights>Krallinger et al.; licensee Springer. 2015. This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver ( ) applies to the data made available in this article, unless otherwise stated.</rights><rights>2015 Krallinger et al.; licensee Springer. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.</rights><rights>Journal of Cheminformatics is a copyright of Springer, 2015.</rights><rights>Copyright © 2015 Krallinger et al.; licensee Springer. 2015 Krallinger et al.; licensee Springer.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c531t-7a2cddfd2ca25de63e58f535ae7927c99899e6fc1a845d219567d181bf1c68c03</citedby><cites>FETCH-LOGICAL-c531t-7a2cddfd2ca25de63e58f535ae7927c99899e6fc1a845d219567d181bf1c68c03</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/1646878096/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/1646878096?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,4024,25753,27923,27924,27925,37012,37013,44590,53791,53793,75126</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/25810773$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Krallinger, Martin</creatorcontrib><creatorcontrib>Rabal, Obdulia</creatorcontrib><creatorcontrib>Leitner, Florian</creatorcontrib><creatorcontrib>Vazquez, Miguel</creatorcontrib><creatorcontrib>Salgado, David</creatorcontrib><creatorcontrib>Lu, Zhiyong</creatorcontrib><creatorcontrib>Leaman, Robert</creatorcontrib><creatorcontrib>Lu, Yanan</creatorcontrib><creatorcontrib>Ji, Donghong</creatorcontrib><creatorcontrib>Lowe, Daniel M</creatorcontrib><creatorcontrib>Sayle, Roger A</creatorcontrib><creatorcontrib>Batista-Navarro, Riza Theresa</creatorcontrib><creatorcontrib>Rak, Rafal</creatorcontrib><creatorcontrib>Huber, Torsten</creatorcontrib><creatorcontrib>Rocktäschel, Tim</creatorcontrib><creatorcontrib>Matos, Sérgio</creatorcontrib><creatorcontrib>Campos, David</creatorcontrib><creatorcontrib>Tang, Buzhou</creatorcontrib><creatorcontrib>Xu, Hua</creatorcontrib><creatorcontrib>Munkhdalai, Tsendsuren</creatorcontrib><creatorcontrib>Ryu, Keun Ho</creatorcontrib><creatorcontrib>Ramanan, SV</creatorcontrib><creatorcontrib>Nathan, Senthil</creatorcontrib><creatorcontrib>Žitnik, Slavko</creatorcontrib><creatorcontrib>Bajec, Marko</creatorcontrib><creatorcontrib>Weber, Lutz</creatorcontrib><creatorcontrib>Irmer, Matthias</creatorcontrib><creatorcontrib>Akhondi, Saber A</creatorcontrib><creatorcontrib>Kors, Jan A</creatorcontrib><creatorcontrib>Xu, Shuo</creatorcontrib><creatorcontrib>An, Xin</creatorcontrib><creatorcontrib>Sikdar, Utpal Kumar</creatorcontrib><creatorcontrib>Ekbal, Asif</creatorcontrib><creatorcontrib>Yoshioka, Masaharu</creatorcontrib><creatorcontrib>Dieb, Thaer M</creatorcontrib><creatorcontrib>Choi, Miji</creatorcontrib><creatorcontrib>Verspoor, Karin</creatorcontrib><creatorcontrib>Khabsa, Madian</creatorcontrib><creatorcontrib>Giles, C Lee</creatorcontrib><creatorcontrib>Liu, Hongfang</creatorcontrib><creatorcontrib>Ravikumar, Komandur Elayavilli</creatorcontrib><creatorcontrib>Lamurias, Andre</creatorcontrib><creatorcontrib>Couto, Francisco M</creatorcontrib><creatorcontrib>Dai, Hong-Jie</creatorcontrib><creatorcontrib>Tsai, Richard Tzong-Han</creatorcontrib><creatorcontrib>Ata, Caglar</creatorcontrib><creatorcontrib>Can, Tolga</creatorcontrib><creatorcontrib>Usié, Anabel</creatorcontrib><creatorcontrib>Alves, Rui</creatorcontrib><creatorcontrib>Segura-Bedmar, Isabel</creatorcontrib><creatorcontrib>Martínez, Paloma</creatorcontrib><creatorcontrib>Oyarzabal, Julen</creatorcontrib><creatorcontrib>Valencia, Alfonso</creatorcontrib><title>The CHEMDNER corpus of chemicals and drugs and its annotation principles</title><title>Journal of cheminformatics</title><addtitle>J Cheminform</addtitle><addtitle>J Cheminform</addtitle><description>The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus/</description><subject>Chemistry</subject><subject>Chemistry and Materials Science</subject><subject>Computational Biology/Bioinformatics</subject><subject>Computer Applications in Chemistry</subject><subject>Documentation and Information in Chemistry</subject><subject>Theoretical and Computational Chemistry</subject><issn>1758-2946</issn><issn>1758-2946</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNkU1LxDAQhoMofv8AL1Lw4qWaSZuviyDr6gp-gKvnEJN0t9Jt1qQV_Pe2rC6roHiaYfLMO5l5EToAfAIg2ClwKlIic5bydAzpmKyh7WVtfSXfQjsxvmDMKMd8E20RKgBznm2j0ePUJYPR8PbibviQGB_mbUx8kZipm5VGVzHRtU1saCeLrGz6WPtGN6Wvk3koa1POKxf30EbR4W7_M-6ip8vh42CU3txfXQ_Ob1JDM2hSromxtrDEaEKtY5mjoqAZ1Y5Lwo2UQkrHCgNa5NQSkJRxCwKeCzBMGJztorOF7rx9njlrXN0EXanuIzMd3pXXpfr-UpdTNfFvKs8yYJJ0AsefAsG_ti42alZG46pK1863UQFjPKNYYvYfFOdYCuAdevQDffFtqLtLqM4pSijmUvxJsZwJLrDsx8KCMsHHGFyx3A6w6o1XvbGqN1ZxNQY17rc6XD3LsuPL6Q4gCyD2nk1cWBn9q-oHwlq3bA</recordid><startdate>2015</startdate><enddate>2015</enddate><creator>Krallinger, Martin</creator><creator>Rabal, Obdulia</creator><creator>Leitner, Florian</creator><creator>Vazquez, Miguel</creator><creator>Salgado, David</creator><creator>Lu, Zhiyong</creator><creator>Leaman, Robert</creator><creator>Lu, Yanan</creator><creator>Ji, Donghong</creator><creator>Lowe, Daniel M</creator><creator>Sayle, Roger A</creator><creator>Batista-Navarro, Riza Theresa</creator><creator>Rak, Rafal</creator><creator>Huber, Torsten</creator><creator>Rocktäschel, Tim</creator><creator>Matos, Sérgio</creator><creator>Campos, David</creator><creator>Tang, Buzhou</creator><creator>Xu, Hua</creator><creator>Munkhdalai, Tsendsuren</creator><creator>Ryu, Keun Ho</creator><creator>Ramanan, SV</creator><creator>Nathan, Senthil</creator><creator>Žitnik, Slavko</creator><creator>Bajec, Marko</creator><creator>Weber, Lutz</creator><creator>Irmer, Matthias</creator><creator>Akhondi, Saber A</creator><creator>Kors, Jan A</creator><creator>Xu, Shuo</creator><creator>An, Xin</creator><creator>Sikdar, Utpal Kumar</creator><creator>Ekbal, Asif</creator><creator>Yoshioka, Masaharu</creator><creator>Dieb, Thaer M</creator><creator>Choi, Miji</creator><creator>Verspoor, Karin</creator><creator>Khabsa, Madian</creator><creator>Giles, C Lee</creator><creator>Liu, Hongfang</creator><creator>Ravikumar, Komandur Elayavilli</creator><creator>Lamurias, Andre</creator><creator>Couto, Francisco M</creator><creator>Dai, Hong-Jie</creator><creator>Tsai, Richard Tzong-Han</creator><creator>Ata, Caglar</creator><creator>Can, Tolga</creator><creator>Usié, Anabel</creator><creator>Alves, Rui</creator><creator>Segura-Bedmar, Isabel</creator><creator>Martínez, Paloma</creator><creator>Oyarzabal, Julen</creator><creator>Valencia, Alfonso</creator><general>Springer International Publishing</general><general>Springer Nature B.V</general><general>BioMed Central</general><scope>C6C</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7QO</scope><scope>7X7</scope><scope>7XB</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>D1I</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>KB.</scope><scope>LK8</scope><scope>M0S</scope><scope>M7P</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PDBOC</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>2015</creationdate><title>The CHEMDNER corpus of chemicals and drugs and its annotation principles</title><author>Krallinger, Martin ; Rabal, Obdulia ; Leitner, Florian ; Vazquez, Miguel ; Salgado, David ; Lu, Zhiyong ; Leaman, Robert ; Lu, Yanan ; Ji, Donghong ; Lowe, Daniel M ; Sayle, Roger A ; Batista-Navarro, Riza Theresa ; Rak, Rafal ; Huber, Torsten ; Rocktäschel, Tim ; Matos, Sérgio ; Campos, David ; Tang, Buzhou ; Xu, Hua ; Munkhdalai, Tsendsuren ; Ryu, Keun Ho ; Ramanan, SV ; Nathan, Senthil ; Žitnik, Slavko ; Bajec, Marko ; Weber, Lutz ; Irmer, Matthias ; Akhondi, Saber A ; Kors, Jan A ; Xu, Shuo ; An, Xin ; Sikdar, Utpal Kumar ; Ekbal, Asif ; Yoshioka, Masaharu ; Dieb, Thaer M ; Choi, Miji ; Verspoor, Karin ; Khabsa, Madian ; Giles, C Lee ; Liu, Hongfang ; Ravikumar, Komandur Elayavilli ; Lamurias, Andre ; Couto, Francisco M ; Dai, Hong-Jie ; Tsai, Richard Tzong-Han ; Ata, Caglar ; Can, Tolga ; Usié, Anabel ; Alves, Rui ; Segura-Bedmar, Isabel ; Martínez, Paloma ; Oyarzabal, Julen ; Valencia, Alfonso</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c531t-7a2cddfd2ca25de63e58f535ae7927c99899e6fc1a845d219567d181bf1c68c03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Chemistry</topic><topic>Chemistry and Materials Science</topic><topic>Computational Biology/Bioinformatics</topic><topic>Computer Applications in Chemistry</topic><topic>Documentation and Information in Chemistry</topic><topic>Theoretical and Computational Chemistry</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Krallinger, Martin</creatorcontrib><creatorcontrib>Rabal, Obdulia</creatorcontrib><creatorcontrib>Leitner, Florian</creatorcontrib><creatorcontrib>Vazquez, Miguel</creatorcontrib><creatorcontrib>Salgado, David</creatorcontrib><creatorcontrib>Lu, Zhiyong</creatorcontrib><creatorcontrib>Leaman, Robert</creatorcontrib><creatorcontrib>Lu, Yanan</creatorcontrib><creatorcontrib>Ji, Donghong</creatorcontrib><creatorcontrib>Lowe, Daniel M</creatorcontrib><creatorcontrib>Sayle, Roger A</creatorcontrib><creatorcontrib>Batista-Navarro, Riza Theresa</creatorcontrib><creatorcontrib>Rak, Rafal</creatorcontrib><creatorcontrib>Huber, Torsten</creatorcontrib><creatorcontrib>Rocktäschel, Tim</creatorcontrib><creatorcontrib>Matos, Sérgio</creatorcontrib><creatorcontrib>Campos, David</creatorcontrib><creatorcontrib>Tang, Buzhou</creatorcontrib><creatorcontrib>Xu, Hua</creatorcontrib><creatorcontrib>Munkhdalai, Tsendsuren</creatorcontrib><creatorcontrib>Ryu, Keun Ho</creatorcontrib><creatorcontrib>Ramanan, SV</creatorcontrib><creatorcontrib>Nathan, Senthil</creatorcontrib><creatorcontrib>Žitnik, Slavko</creatorcontrib><creatorcontrib>Bajec, Marko</creatorcontrib><creatorcontrib>Weber, Lutz</creatorcontrib><creatorcontrib>Irmer, Matthias</creatorcontrib><creatorcontrib>Akhondi, Saber A</creatorcontrib><creatorcontrib>Kors, Jan A</creatorcontrib><creatorcontrib>Xu, Shuo</creatorcontrib><creatorcontrib>An, Xin</creatorcontrib><creatorcontrib>Sikdar, Utpal Kumar</creatorcontrib><creatorcontrib>Ekbal, Asif</creatorcontrib><creatorcontrib>Yoshioka, Masaharu</creatorcontrib><creatorcontrib>Dieb, Thaer M</creatorcontrib><creatorcontrib>Choi, Miji</creatorcontrib><creatorcontrib>Verspoor, Karin</creatorcontrib><creatorcontrib>Khabsa, Madian</creatorcontrib><creatorcontrib>Giles, C Lee</creatorcontrib><creatorcontrib>Liu, Hongfang</creatorcontrib><creatorcontrib>Ravikumar, Komandur Elayavilli</creatorcontrib><creatorcontrib>Lamurias, Andre</creatorcontrib><creatorcontrib>Couto, Francisco M</creatorcontrib><creatorcontrib>Dai, Hong-Jie</creatorcontrib><creatorcontrib>Tsai, Richard Tzong-Han</creatorcontrib><creatorcontrib>Ata, Caglar</creatorcontrib><creatorcontrib>Can, Tolga</creatorcontrib><creatorcontrib>Usié, Anabel</creatorcontrib><creatorcontrib>Alves, Rui</creatorcontrib><creatorcontrib>Segura-Bedmar, Isabel</creatorcontrib><creatorcontrib>Martínez, Paloma</creatorcontrib><creatorcontrib>Oyarzabal, Julen</creatorcontrib><creatorcontrib>Valencia, Alfonso</creatorcontrib><collection>SpringerOpen</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Biotechnology Research Abstracts</collection><collection>ProQuest Health and Medical</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Materials Science Collection</collection><collection>ProQuest Central</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Materials Science Database</collection><collection>ProQuest Biological Science Collection</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>ProQuest Biological Science Journals</collection><collection>ProQuest advanced technologies &amp; aerospace journals</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Materials science collection</collection><collection>ProQuest - Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Journal of cheminformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Krallinger, Martin</au><au>Rabal, Obdulia</au><au>Leitner, Florian</au><au>Vazquez, Miguel</au><au>Salgado, David</au><au>Lu, Zhiyong</au><au>Leaman, Robert</au><au>Lu, Yanan</au><au>Ji, Donghong</au><au>Lowe, Daniel M</au><au>Sayle, Roger A</au><au>Batista-Navarro, Riza Theresa</au><au>Rak, Rafal</au><au>Huber, Torsten</au><au>Rocktäschel, Tim</au><au>Matos, Sérgio</au><au>Campos, David</au><au>Tang, Buzhou</au><au>Xu, Hua</au><au>Munkhdalai, Tsendsuren</au><au>Ryu, Keun Ho</au><au>Ramanan, SV</au><au>Nathan, Senthil</au><au>Žitnik, Slavko</au><au>Bajec, Marko</au><au>Weber, Lutz</au><au>Irmer, Matthias</au><au>Akhondi, Saber A</au><au>Kors, Jan A</au><au>Xu, Shuo</au><au>An, Xin</au><au>Sikdar, Utpal Kumar</au><au>Ekbal, Asif</au><au>Yoshioka, Masaharu</au><au>Dieb, Thaer M</au><au>Choi, Miji</au><au>Verspoor, Karin</au><au>Khabsa, Madian</au><au>Giles, C Lee</au><au>Liu, Hongfang</au><au>Ravikumar, Komandur Elayavilli</au><au>Lamurias, Andre</au><au>Couto, Francisco M</au><au>Dai, Hong-Jie</au><au>Tsai, Richard Tzong-Han</au><au>Ata, Caglar</au><au>Can, Tolga</au><au>Usié, Anabel</au><au>Alves, Rui</au><au>Segura-Bedmar, Isabel</au><au>Martínez, Paloma</au><au>Oyarzabal, Julen</au><au>Valencia, Alfonso</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>The CHEMDNER corpus of chemicals and drugs and its annotation principles</atitle><jtitle>Journal of cheminformatics</jtitle><stitle>J Cheminform</stitle><addtitle>J Cheminform</addtitle><date>2015</date><risdate>2015</risdate><volume>7</volume><issue>Suppl 1</issue><spage>S2</spage><epage>S2</epage><pages>S2-S2</pages><artnum>S2</artnum><issn>1758-2946</issn><eissn>1758-2946</eissn><abstract>The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus/</abstract><cop>Cham</cop><pub>Springer International Publishing</pub><pmid>25810773</pmid><doi>10.1186/1758-2946-7-S1-S2</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1758-2946
ispartof Journal of cheminformatics, 2015, Vol.7 (Suppl 1), p.S2-S2, Article S2
issn 1758-2946
1758-2946
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_4331692
source Springer Nature - SpringerLink Journals - Fully Open Access ; ProQuest - Publicly Available Content Database; PubMed Central; Free Full-Text Journals in Chemistry
subjects Chemistry
Chemistry and Materials Science
Computational Biology/Bioinformatics
Computer Applications in Chemistry
Documentation and Information in Chemistry
Theoretical and Computational Chemistry
title The CHEMDNER corpus of chemicals and drugs and its annotation principles
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T10%3A43%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=The%20CHEMDNER%20corpus%20of%20chemicals%20and%20drugs%20and%20its%20annotation%20principles&rft.jtitle=Journal%20of%20cheminformatics&rft.au=Krallinger,%20Martin&rft.date=2015&rft.volume=7&rft.issue=Suppl%201&rft.spage=S2&rft.epage=S2&rft.pages=S2-S2&rft.artnum=S2&rft.issn=1758-2946&rft.eissn=1758-2946&rft_id=info:doi/10.1186/1758-2946-7-S1-S2&rft_dat=%3Cproquest_pubme%3E4312328661%3C/proquest_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c531t-7a2cddfd2ca25de63e58f535ae7927c99899e6fc1a845d219567d181bf1c68c03%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1646878096&rft_id=info:pmid/25810773&rfr_iscdi=true