Loading…

BioCreative III interactive task: an overview

Background: The BioCreative challenge evaluation is a community-wide effort for evaluating text mining and information extraction systems applied to the biological domain. The biocurator community, as an active user of biomedical literature, provides a diverse and engaged end user group for text min...

Full description

Saved in:
Bibliographic Details
Main Authors: Arighi, Cecilia N, Roberts, Phoebe M, Shashank, Agarwal, Bhattacharya, Sanmitra, Gianni, Cesareni, Chatr-aryamontri, Andrew, Clematide, Simon, Gaudet, Pascale, Giglio, Michelle Gwinn, Harrow, Ian, Huala, Eva, Krallinger, Martin, Leser, Ulf, Li, Donghui, Liu, Feifan, Lu, Zhlyong, Maltais, Lois J, Okazaki, Naoaki, Perfetto, Livia, Rinaldi, Fabio, Sætre, Rune, Salgado, David, Srinivasan, Padmini, Thomas, Philippe E, Toldo, Luca, Hirschman, Lynette, Wu, Cathy H
Format: Article
Language:English
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Background: The BioCreative challenge evaluation is a community-wide effort for evaluating text mining and information extraction systems applied to the biological domain. The biocurator community, as an active user of biomedical literature, provides a diverse and engaged end user group for text mining tools. Earlier BioCreative challenges involved many text mining teams in developing basic capabilities relevant to biological curation, but they did not address the issues of system usage, insertion into the workflow and adoption by curators. Thus in BioCreative III (BC-III), the InterActive Task (IAT) was introduced to address the utility and usability of text mining tools for real-life biocuration tasks. To support the aims of the IAT in BC-III, involvement of both developers and end users was solicited, and the development of a user interface to address the tasks interactively was requested. Results: A User Advisory Group (UAG) actively participated in the IAT design and assessment. The task focused on gene normalization (identifying gene mentions in the article and linking these genes to standard database identifiers), gene ranking based on the overall importance of each gene mentioned in the article, and gene-oriented document retrieval (identifying full text papers relevant to a selected gene). Six systems participated and all processed and displayed the same set of articles. The articles were selected based on content known to be problematic for curation, such as ambiguity of gene names, coverage of multiple genes and species, or introduction of a new gene name. Members of the UAG curated three articles for training and assessment purposes, and each member was assigned a system to review. A questionnaire related to the interface usability and task performance (as measured by precision and recall) was answered after systems were used to curate articles. Although the limited number of articles analyzed and users involved in the IAT experiment precluded rigorous quantitative analysis of the results, a qualitative analysis provided valuable insight into some of the problems encountered by users when using the systems. The overall assessment indicates that the system usability features appealed to most users, but the system performance was suboptimal (mainly due to low accuracy in gene normalization). Some of the issues included failure of species identification and gene name ambiguity in the gene normalization task leading to an extensive list of gene ide