Loading…
Partial Annotation Learning for Biomedical Entity Recognition
Named Entity Recognition (NER) is a key task to support biomedical research. In Biomedical Named Entity Recognition (BioNER), obtaining high-quality expert annotated data is laborious and expensive, leading to the development of automatic approaches such as distant supervision. However, manually and...
Saved in:
Published in: | IEEE journal of biomedical and health informatics 2024-09, Vol.PP, p.1-10 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Named Entity Recognition (NER) is a key task to support biomedical research. In Biomedical Named Entity Recognition (BioNER), obtaining high-quality expert annotated data is laborious and expensive, leading to the development of automatic approaches such as distant supervision. However, manually and automatically generated data often suffer from the unlabeled entity problem , whereby many entity annotations are missing, degrading the performance of full annotation NER models. To conquer this issue, we undertake a systematic exploration of the efficacy of partial annotation learning methods for BioNER, which encompasses a comprehensive evaluation conducted across a spectrum of distinct simulated scenarios of missing entity annotations. Furthermore, we propose a TS-PubMedBERT-Partial-CRF partial annotation learning model. We standardize a compilation of 16 BioNER corpora, encompassing a range of five distinct entity types, to establish a gold standard. And we compare against the state-of-the-art partial annotation model EER-PubMedBERT, the widely acknowledged partial annotation model BiLSTM-Partial-CRF model, and the state-of-the-art full annotation learning BioNER model PubMedBERT tagger. Results show that partial annotation learning-based methods can effectively learn from biomedical corpora with missing entity annotations. Our proposed model outperforms alternatives and, specifically, the PubMedBERT tagger by 38% in F1-score under high missing entity rates. Moreover, the recall of entity mentions in our model demonstrates a competitive alignment with the upper threshold observed on the fully annotated dataset. We have published our data, source code and training records at https://github.com/possible1402/partial\_annotation\_learning . |
---|---|
ISSN: | 2168-2194 2168-2208 2168-2208 |
DOI: | 10.1109/JBHI.2024.3466294 |