Loading…
Ensemble-based Methods to Improve De-identification of Electronic Health Record Narratives
Text de-identification is an application of clinical natural language processing that offers significant efficiency and scalability advantages. Hence, various learning algorithms have been applied to this task to yield better performance. Instead of choosing the best individual learning algorithm, w...
Saved in:
Published in: | AMIA ... Annual Symposium proceedings 2018, Vol.2018, p.663-672 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Text de-identification is an application of clinical natural language processing that offers significant efficiency and scalability advantages. Hence, various learning algorithms have been applied to this task to yield better performance. Instead of choosing the best individual learning algorithm, we aim to improve de-identification by constructing ensembles that lead to more accurate classification. We present three different ensemble methods that combine multiple de-identification models trained from deep learning, shallow learning, and rule-based approaches. Each model is capable of automated de-identification without manual medical expertise. Our experimental results show that the stacked learning ensemble is more effective than other ensemble methods, producing the highest recall, the most important metric for de-identification. The stacked ensemble achieved state-of-the-art performance on the 2014 i2b2 dataset with 97.04% precision, 94.45% recall, and 95.73% F
score. |
---|---|
ISSN: | 1559-4076 |