Loading…

Binary acronym disambiguation in clinical notes from electronic health records with an application in computational phenotyping

•Acronym disambiguation – identifying the meaning of an acronym – is important for information retrieval in clinical EHR systems.•Most acronym disambiguation methods rely on manual annotation.•We propose a novel unsupervised method, CASEml, that uses the surrounding words as well as visit informatio...

Full description

Saved in:

Bibliographic Details
Published in:	International journal of medical informatics (Shannon, Ireland) Ireland), 2022-06, Vol.162, p.104753-104753, Article 104753
Main Authors:	Link, Nicholas B., Huang, Sicong, Cai, Tianrun, Sun, Jiehuan, Dahal, Kumar, Costa, Lauren, Cho, Kelly, Liao, Katherine, Cai, Tianxi, Hong, Chuan
Format:	Article
Language:	English
Subjects:	Acronym disambiguation Electronic health records Natural language processing Predictive modeling Semantic embedding Unsupervised learning
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	•Acronym disambiguation – identifying the meaning of an acronym – is important for information retrieval in clinical EHR systems.•Most acronym disambiguation methods rely on manual annotation.•We propose a novel unsupervised method, CASEml, that uses the surrounding words as well as visit information to disambiguate acronyms.•CASEml performs as good or better than a state-of-the-art knowledge-based methods.•We demonstrate the utility of CASEml for downstream NLP tasks using clinical EHR text. The use of electronic health records (EHR) systems has grown over the past decade, and with it, the need to extract information from unstructured clinical narratives. Clinical notes, however, frequently contain acronyms with several potential senses (meanings) and traditional natural language processing (NLP) techniques cannot differentiate between these senses. In this study we introduce a semi-supervised method for binary acronym disambiguation, the task of classifying a target sense for acronyms in the clinical EHR notes. We developed a semi-supervised ensemble machine learning (CASEml) algorithm to automatically identify when an acronym means a target sense by leveraging semantic embeddings, visit-level text and billing information. The algorithm was validated using note data from the Veterans Affairs hospital system to classify the meaning of three acronyms: RA, MS, and MI. We compared the performance of CASEml against another standard semi-supervised method and a baseline metric selecting the most frequent acronym sense. Along with evaluating the performance of these methods for specific instances of acronyms, we evaluated the impact of acronym disambiguation on NLP-driven phenotyping of rheumatoid arthritis. CASEml achieved accuracies of 0.947, 0.911, and 0.706 for RA, MS, and MI, respectively, higher than a standard baseline metric and (on average) higher than a state-of-the-art semi-supervised method. As well, we demonstrated that applying CASEml to medical notes improves the AUC of a phenotype algorithm for rheumatoid arthritis. CASEml is a novel method that accurately disambiguates acronyms in clinical notes and has advantages over commonly used supervised and semi-supervised machine learning approaches. In addition, CASEml improves the performance of NLP tasks that rely on ambiguous acronyms, such as phenotyping.
ISSN:	1386-5056 1872-8243
DOI:	10.1016/j.ijmedinf.2022.104753