Loading…

Transformer-based approach for symptom recognition and multilingual linking

This paper presents a transformer-based approach for symptom Named Entity Recognition (NER) in Spanish clinical texts and multilingual entity linking on the SympTEMIST dataset. For Spanish NER, we fine tune a RoBERTa-based token-level classifier with Bidirectional Long Short-Term Memory and conditio...

Full description

Saved in:
Bibliographic Details
Published in:Database : the journal of biological databases and curation 2024-09, Vol.2024
Main Authors: Vassileva, Sylvia, Grazhdanski, Georgi, Koychev, Ivan, Boytcheva, Svetla
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper presents a transformer-based approach for symptom Named Entity Recognition (NER) in Spanish clinical texts and multilingual entity linking on the SympTEMIST dataset. For Spanish NER, we fine tune a RoBERTa-based token-level classifier with Bidirectional Long Short-Term Memory and conditional random field layers on an augmented train set, achieving an F1 score of 0.73. Entity linking is performed via a hybrid approach with dictionaries, generating candidates from a knowledge base containing Unified Medical Language System aliases using the cross-lingual SapBERT and reranking the top candidates using GPT-3.5. The entity linking approach shows consistent results for multiple languages of 0.73 accuracy on the SympTEMIST multilingual dataset and also achieves an accuracy of 0.6123 on the Spanish entity linking task surpassing the current top score for this subtask. Database URL: https://github.com/svassileva/symptemist-multilingual-linking.
ISSN:1758-0463
1758-0463
DOI:10.1093/database/baae090