Loading…

Active learning-based information structure analysis of full scientific articles and two applications for biomedical literature review

Techniques that are capable of automatically analyzing the information structure of scientific articles could be highly useful for improving information access to biomedical literature. However, most existing approaches rely on supervised machine learning (ML) and substantial labeled data that are e...

Full description

Saved in:
Bibliographic Details
Published in:Bioinformatics 2013-06, Vol.29 (11), p.1440-1447
Main Authors: Guo, Yufan, Silins, Ilona, Stenius, Ulla, Korhonen, Anna
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c477t-cb6988e1fc152095a8af92d93c7a33bbba6b4841344654359f8be78368cec4d3
cites cdi_FETCH-LOGICAL-c477t-cb6988e1fc152095a8af92d93c7a33bbba6b4841344654359f8be78368cec4d3
container_end_page 1447
container_issue 11
container_start_page 1440
container_title Bioinformatics
container_volume 29
creator Guo, Yufan
Silins, Ilona
Stenius, Ulla
Korhonen, Anna
description Techniques that are capable of automatically analyzing the information structure of scientific articles could be highly useful for improving information access to biomedical literature. However, most existing approaches rely on supervised machine learning (ML) and substantial labeled data that are expensive to develop and apply to different sub-fields of biomedicine. Recent research shows that minimal supervision is sufficient for fairly accurate information structure analysis of biomedical abstracts. However, is it realistic for full articles given their high linguistic and informational complexity? We introduce and release a novel corpus of 50 biomedical articles annotated according to the Argumentative Zoning (AZ) scheme, and investigate active learning with one of the most widely used ML models-Support Vector Machines (SVM)-on this corpus. Additionally, we introduce two novel applications that use AZ to support real-life literature review in biomedicine via question answering and summarization. We show that active learning with SVM trained on 500 labeled sentences (6% of the corpus) performs surprisingly well with the accuracy of 82%, just 2% lower than fully supervised learning. In our question answering task, biomedical researchers find relevant information significantly faster from AZ-annotated than unannotated articles. In the summarization task, sentences extracted from particular zones are significantly more similar to gold standard summaries than those extracted from particular sections of full articles. These results demonstrate that active learning of full articles' information structure is indeed realistic and the accuracy is high enough to support real-life literature review in biomedicine. The annotated corpus, our AZ classifier and the two novel applications are available at http://www.cl.cam.ac.uk/yg244/12bioinfo.html
doi_str_mv 10.1093/bioinformatics/btt163
format article
fullrecord <record><control><sourceid>proquest_swepu</sourceid><recordid>TN_cdi_swepub_primary_oai_swepub_ki_se_530256</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1372061737</sourcerecordid><originalsourceid>FETCH-LOGICAL-c477t-cb6988e1fc152095a8af92d93c7a33bbba6b4841344654359f8be78368cec4d3</originalsourceid><addsrcrecordid>eNqNkk1vFSEYhYnR2Fr9CRqWbsbC8DnLpvEraeKmewLMi0G5wwhMb_oH_N1i7-01Lkxc8ebNcw6HcBB6Tck7SiZ26WKOS8hlZ1v09dK1RiV7gs4pk2rgmtKnp5mwM_Si1m-EEEGEfI7ORiYk15yfo59XvsU7wAlsWeLydXC2woxP1nnBtZXNt60AtotN9zVWnAMOW0q4-ghLiyF6bEvPkaB2aMZtn7Fd1xT9g0XF3Q33xDuY-yrhFBsU--BZ4C7C_iV6Fmyq8Op4XqDbD-9vrz8NN18-fr6-uhk8V6oN3slJa6DBUzGSSVhtwzTOE_PKMuacs9L1d1HGuRSciSloB0ozqT14PrMLNBxs6x7WzZm1xJ0t9ybbaI6r730CIxgZhez89E9-LXn-I3oU0lFqKvQouvbtQdvBHxvUZnaxekjJLpC3aihTI5FUMfUfqBBcCc51R8UB9SXXWiCcMlFifhfD_F0McyhG1705XrG5_gsn1WMT2C8w-r8G</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1355475448</pqid></control><display><type>article</type><title>Active learning-based information structure analysis of full scientific articles and two applications for biomedical literature review</title><source>PubMed (Medline)</source><source>Open Access: Oxford University Press Open Journals</source><creator>Guo, Yufan ; Silins, Ilona ; Stenius, Ulla ; Korhonen, Anna</creator><creatorcontrib>Guo, Yufan ; Silins, Ilona ; Stenius, Ulla ; Korhonen, Anna</creatorcontrib><description>Techniques that are capable of automatically analyzing the information structure of scientific articles could be highly useful for improving information access to biomedical literature. However, most existing approaches rely on supervised machine learning (ML) and substantial labeled data that are expensive to develop and apply to different sub-fields of biomedicine. Recent research shows that minimal supervision is sufficient for fairly accurate information structure analysis of biomedical abstracts. However, is it realistic for full articles given their high linguistic and informational complexity? We introduce and release a novel corpus of 50 biomedical articles annotated according to the Argumentative Zoning (AZ) scheme, and investigate active learning with one of the most widely used ML models-Support Vector Machines (SVM)-on this corpus. Additionally, we introduce two novel applications that use AZ to support real-life literature review in biomedicine via question answering and summarization. We show that active learning with SVM trained on 500 labeled sentences (6% of the corpus) performs surprisingly well with the accuracy of 82%, just 2% lower than fully supervised learning. In our question answering task, biomedical researchers find relevant information significantly faster from AZ-annotated than unannotated articles. In the summarization task, sentences extracted from particular zones are significantly more similar to gold standard summaries than those extracted from particular sections of full articles. These results demonstrate that active learning of full articles' information structure is indeed realistic and the accuracy is high enough to support real-life literature review in biomedicine. The annotated corpus, our AZ classifier and the two novel applications are available at http://www.cl.cam.ac.uk/yg244/12bioinfo.html</description><identifier>ISSN: 1367-4803</identifier><identifier>ISSN: 1367-4811</identifier><identifier>EISSN: 1367-4811</identifier><identifier>EISSN: 1460-2059</identifier><identifier>DOI: 10.1093/bioinformatics/btt163</identifier><identifier>PMID: 23564844</identifier><language>eng</language><publisher>England</publisher><subject>Artificial Intelligence ; Biblioteks- och informationsvetenskap ; Data Mining - methods ; Medie- och kommunikationsvetenskap ; Periodicals as Topic ; Samhällsvetenskap ; Support Vector Machine</subject><ispartof>Bioinformatics, 2013-06, Vol.29 (11), p.1440-1447</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c477t-cb6988e1fc152095a8af92d93c7a33bbba6b4841344654359f8be78368cec4d3</citedby><cites>FETCH-LOGICAL-c477t-cb6988e1fc152095a8af92d93c7a33bbba6b4841344654359f8be78368cec4d3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,780,784,885,27924,27925</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/23564844$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink><backlink>$$Uhttp://kipublications.ki.se/Default.aspx?queryparsed=id:126815825$$DView record from Swedish Publication Index$$Hfree_for_read</backlink></links><search><creatorcontrib>Guo, Yufan</creatorcontrib><creatorcontrib>Silins, Ilona</creatorcontrib><creatorcontrib>Stenius, Ulla</creatorcontrib><creatorcontrib>Korhonen, Anna</creatorcontrib><title>Active learning-based information structure analysis of full scientific articles and two applications for biomedical literature review</title><title>Bioinformatics</title><addtitle>Bioinformatics</addtitle><description>Techniques that are capable of automatically analyzing the information structure of scientific articles could be highly useful for improving information access to biomedical literature. However, most existing approaches rely on supervised machine learning (ML) and substantial labeled data that are expensive to develop and apply to different sub-fields of biomedicine. Recent research shows that minimal supervision is sufficient for fairly accurate information structure analysis of biomedical abstracts. However, is it realistic for full articles given their high linguistic and informational complexity? We introduce and release a novel corpus of 50 biomedical articles annotated according to the Argumentative Zoning (AZ) scheme, and investigate active learning with one of the most widely used ML models-Support Vector Machines (SVM)-on this corpus. Additionally, we introduce two novel applications that use AZ to support real-life literature review in biomedicine via question answering and summarization. We show that active learning with SVM trained on 500 labeled sentences (6% of the corpus) performs surprisingly well with the accuracy of 82%, just 2% lower than fully supervised learning. In our question answering task, biomedical researchers find relevant information significantly faster from AZ-annotated than unannotated articles. In the summarization task, sentences extracted from particular zones are significantly more similar to gold standard summaries than those extracted from particular sections of full articles. These results demonstrate that active learning of full articles' information structure is indeed realistic and the accuracy is high enough to support real-life literature review in biomedicine. The annotated corpus, our AZ classifier and the two novel applications are available at http://www.cl.cam.ac.uk/yg244/12bioinfo.html</description><subject>Artificial Intelligence</subject><subject>Biblioteks- och informationsvetenskap</subject><subject>Data Mining - methods</subject><subject>Medie- och kommunikationsvetenskap</subject><subject>Periodicals as Topic</subject><subject>Samhällsvetenskap</subject><subject>Support Vector Machine</subject><issn>1367-4803</issn><issn>1367-4811</issn><issn>1367-4811</issn><issn>1460-2059</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><recordid>eNqNkk1vFSEYhYnR2Fr9CRqWbsbC8DnLpvEraeKmewLMi0G5wwhMb_oH_N1i7-01Lkxc8ebNcw6HcBB6Tck7SiZ26WKOS8hlZ1v09dK1RiV7gs4pk2rgmtKnp5mwM_Si1m-EEEGEfI7ORiYk15yfo59XvsU7wAlsWeLydXC2woxP1nnBtZXNt60AtotN9zVWnAMOW0q4-ghLiyF6bEvPkaB2aMZtn7Fd1xT9g0XF3Q33xDuY-yrhFBsU--BZ4C7C_iV6Fmyq8Op4XqDbD-9vrz8NN18-fr6-uhk8V6oN3slJa6DBUzGSSVhtwzTOE_PKMuacs9L1d1HGuRSciSloB0ozqT14PrMLNBxs6x7WzZm1xJ0t9ybbaI6r730CIxgZhez89E9-LXn-I3oU0lFqKvQouvbtQdvBHxvUZnaxekjJLpC3aihTI5FUMfUfqBBcCc51R8UB9SXXWiCcMlFifhfD_F0McyhG1705XrG5_gsn1WMT2C8w-r8G</recordid><startdate>20130601</startdate><enddate>20130601</enddate><creator>Guo, Yufan</creator><creator>Silins, Ilona</creator><creator>Stenius, Ulla</creator><creator>Korhonen, Anna</creator><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>7QO</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope><scope>ADTPV</scope><scope>AOWAS</scope></search><sort><creationdate>20130601</creationdate><title>Active learning-based information structure analysis of full scientific articles and two applications for biomedical literature review</title><author>Guo, Yufan ; Silins, Ilona ; Stenius, Ulla ; Korhonen, Anna</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c477t-cb6988e1fc152095a8af92d93c7a33bbba6b4841344654359f8be78368cec4d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Artificial Intelligence</topic><topic>Biblioteks- och informationsvetenskap</topic><topic>Data Mining - methods</topic><topic>Medie- och kommunikationsvetenskap</topic><topic>Periodicals as Topic</topic><topic>Samhällsvetenskap</topic><topic>Support Vector Machine</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Guo, Yufan</creatorcontrib><creatorcontrib>Silins, Ilona</creatorcontrib><creatorcontrib>Stenius, Ulla</creatorcontrib><creatorcontrib>Korhonen, Anna</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>Biotechnology Research Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>SwePub</collection><collection>SwePub Articles</collection><jtitle>Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Guo, Yufan</au><au>Silins, Ilona</au><au>Stenius, Ulla</au><au>Korhonen, Anna</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Active learning-based information structure analysis of full scientific articles and two applications for biomedical literature review</atitle><jtitle>Bioinformatics</jtitle><addtitle>Bioinformatics</addtitle><date>2013-06-01</date><risdate>2013</risdate><volume>29</volume><issue>11</issue><spage>1440</spage><epage>1447</epage><pages>1440-1447</pages><issn>1367-4803</issn><issn>1367-4811</issn><eissn>1367-4811</eissn><eissn>1460-2059</eissn><abstract>Techniques that are capable of automatically analyzing the information structure of scientific articles could be highly useful for improving information access to biomedical literature. However, most existing approaches rely on supervised machine learning (ML) and substantial labeled data that are expensive to develop and apply to different sub-fields of biomedicine. Recent research shows that minimal supervision is sufficient for fairly accurate information structure analysis of biomedical abstracts. However, is it realistic for full articles given their high linguistic and informational complexity? We introduce and release a novel corpus of 50 biomedical articles annotated according to the Argumentative Zoning (AZ) scheme, and investigate active learning with one of the most widely used ML models-Support Vector Machines (SVM)-on this corpus. Additionally, we introduce two novel applications that use AZ to support real-life literature review in biomedicine via question answering and summarization. We show that active learning with SVM trained on 500 labeled sentences (6% of the corpus) performs surprisingly well with the accuracy of 82%, just 2% lower than fully supervised learning. In our question answering task, biomedical researchers find relevant information significantly faster from AZ-annotated than unannotated articles. In the summarization task, sentences extracted from particular zones are significantly more similar to gold standard summaries than those extracted from particular sections of full articles. These results demonstrate that active learning of full articles' information structure is indeed realistic and the accuracy is high enough to support real-life literature review in biomedicine. The annotated corpus, our AZ classifier and the two novel applications are available at http://www.cl.cam.ac.uk/yg244/12bioinfo.html</abstract><cop>England</cop><pmid>23564844</pmid><doi>10.1093/bioinformatics/btt163</doi><tpages>8</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1367-4803
ispartof Bioinformatics, 2013-06, Vol.29 (11), p.1440-1447
issn 1367-4803
1367-4811
1367-4811
1460-2059
language eng
recordid cdi_swepub_primary_oai_swepub_ki_se_530256
source PubMed (Medline); Open Access: Oxford University Press Open Journals
subjects Artificial Intelligence
Biblioteks- och informationsvetenskap
Data Mining - methods
Medie- och kommunikationsvetenskap
Periodicals as Topic
Samhällsvetenskap
Support Vector Machine
title Active learning-based information structure analysis of full scientific articles and two applications for biomedical literature review
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T18%3A57%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_swepu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Active%20learning-based%20information%20structure%20analysis%20of%20full%20scientific%20articles%20and%20two%20applications%20for%20biomedical%20literature%20review&rft.jtitle=Bioinformatics&rft.au=Guo,%20Yufan&rft.date=2013-06-01&rft.volume=29&rft.issue=11&rft.spage=1440&rft.epage=1447&rft.pages=1440-1447&rft.issn=1367-4803&rft.eissn=1367-4811&rft_id=info:doi/10.1093/bioinformatics/btt163&rft_dat=%3Cproquest_swepu%3E1372061737%3C/proquest_swepu%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c477t-cb6988e1fc152095a8af92d93c7a33bbba6b4841344654359f8be78368cec4d3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1355475448&rft_id=info:pmid/23564844&rfr_iscdi=true