Loading…
A Case Study of Data Quality in Text Mining Clinical Progress Notes
Text analytic methods are often aimed at extracting useful information from the vast array of unstructured, free format text documents that are created by almost all organizational processes. The success of any text mining application rests on the quality of the underlying data being analyzed, inclu...
Saved in:
Published in: | ACM transactions on management information systems 2015-04, Vol.6 (1), p.1-21 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Text analytic methods are often aimed at extracting useful information from the vast array of unstructured, free format text documents that are created by almost all organizational processes. The success of any text mining application rests on the quality of the underlying data being analyzed, including both predictive features and outcome labels. In this case study, some focused experiments regarding data quality are used to assess the robustness of Statistical Text Mining (STM) algorithms when applied to clinical progress notes. In particular, the experiments consider the impacts of task complexity (by removing signals), training set size, and target outcome quality. While this research is conducted using a dataset drawn from the medical domain, the data quality issues explored are of more general interest. |
---|---|
ISSN: | 2158-656X 2158-6578 |
DOI: | 10.1145/2669368 |