Loading…

Ensemble-based Methods to Improve De-identification of Electronic Health Record Narratives

Text de-identification is an application of clinical natural language processing that offers significant efficiency and scalability advantages. Hence, various learning algorithms have been applied to this task to yield better performance. Instead of choosing the best individual learning algorithm, w...

Full description

Saved in:
Bibliographic Details
Published in:AMIA ... Annual Symposium proceedings 2018, Vol.2018, p.663-672
Main Authors: Kim, Youngjun, Heider, Paul, Meystre, Stéphane
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Text de-identification is an application of clinical natural language processing that offers significant efficiency and scalability advantages. Hence, various learning algorithms have been applied to this task to yield better performance. Instead of choosing the best individual learning algorithm, we aim to improve de-identification by constructing ensembles that lead to more accurate classification. We present three different ensemble methods that combine multiple de-identification models trained from deep learning, shallow learning, and rule-based approaches. Each model is capable of automated de-identification without manual medical expertise. Our experimental results show that the stacked learning ensemble is more effective than other ensemble methods, producing the highest recall, the most important metric for de-identification. The stacked ensemble achieved state-of-the-art performance on the 2014 i2b2 dataset with 97.04% precision, 94.45% recall, and 95.73% F score.
ISSN:1559-4076