Loading…
Uncovering condition information loss in medical text extraction: The challenge of non-contiguous spans
•In medical texts, 18.6 % of patient condition phrases include unrelated information.•The accuracy of traditional NER and EL methods is limited.•We identified the types of entities that current extraction techniques often miss. We investigated the limitations of conventional named entity recognition...
Saved in:
Published in: | Next Research 2024-12, Vol.1 (2), p.100044, Article 100044 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •In medical texts, 18.6 % of patient condition phrases include unrelated information.•The accuracy of traditional NER and EL methods is limited.•We identified the types of entities that current extraction techniques often miss.
We investigated the limitations of conventional named entity recognition (NER) and entity linking (EL) methods in accurately extracting patient condition information from medical texts, focusing on the challenges posed by non-contiguous spans and the potential information loss. We utilized a corpus with entity-relation annotations, analyzing the frequency and nature of non-contiguous spans that include irrelevant entities within gaps. The corpus was further analyzed to pinpoint the types of entity representations predominantly linked with peripheral spans—those not encompassing central symptom-describing terms—with a focus on items, body parts, and clinical tests. Our analysis revealed that 18.6 % of patient condition expressions were non-contiguous spans containing irrelevant entities, suggesting an accuracy ceiling of 81.4 % for conventional NER and EL approaches in the worst-case scenario. The study highlights the importance of entity types such as items, body parts, and clinical tests in these expressions, indicating that conventional extraction methods incur considerable information loss. The findings underscore the need for more sophisticated information extraction techniques capable of handling the complexities of medical texts, including non-contiguous spans. Adapting methods that allow gaps within entities or employing graph-based term assignments can enhance the accuracy and comprehensiveness of medical text annotation. |
---|---|
ISSN: | 3050-4759 |
DOI: | 10.1016/j.nexres.2024.100044 |