Loading…
Prediction of intra-abdominal injury using natural language processing of electronic medical record data
This study aimed to use natural language processing to predict the presence of intra-abdominal injury using unstructured data from electronic medical records. This was a random-sample retrospective observational cohort study leveraging unstructured data from injured patients taken to one of 9 acute...
Saved in:
Published in: | Surgery 2024-09, Vol.176 (3), p.577-585 |
---|---|
Main Authors: | , , , , , , , , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This study aimed to use natural language processing to predict the presence of intra-abdominal injury using unstructured data from electronic medical records.
This was a random-sample retrospective observational cohort study leveraging unstructured data from injured patients taken to one of 9 acute care hospitals in an integrated health system between 2015 and 2021. Patients with International Classification of Diseases External Cause of Morbidity codes were identified. History and physical, consult, progress, and radiology report text from the first 8 hours of care were abstracted. Annotator dyads independently annotated encounters’ text files to establish ground truth regarding whether intra-abdominal injury occurred. Features were extracted from text using natural language processing techniques, bag of words, and principal component analysis. We tested logistic regression, random forests, and gradient boosting machine to determine accuracy, recall, and precision of natural language processing to predict intra-abdominal injury.
A random sample of 7,000 patient encounters of 177,127 was annotated. Only 2,951 had sufficient information to determine whether an intra-abdominal injury was present. Among those, 84 (2.9%) had an intra-abdominal injury. The concordance between annotators was 0.989. Logistic regression of features identified with bag of words and principal component analysis had the best predictive ability, with an area under the receiver operating characteristic curve of 0.9, recall of 0.73, and precision of 0.17. Text features with greatest importance included “abdomen,” “pelvis,” “spleen,” and “hematoma.”
Natural language processing could be a screening decision support tool, which, if paired with human clinical assessment, can maximize precision of intra-abdominal injury identification. |
---|---|
ISSN: | 0039-6060 1532-7361 1532-7361 |
DOI: | 10.1016/j.surg.2024.05.042 |