Loading…

A Case Study of Data Quality in Text Mining Clinical Progress Notes

Text analytic methods are often aimed at extracting useful information from the vast array of unstructured, free format text documents that are created by almost all organizational processes. The success of any text mining application rests on the quality of the underlying data being analyzed, inclu...

Full description

Saved in:
Bibliographic Details
Published in:ACM transactions on management information systems 2015-04, Vol.6 (1), p.1-21
Main Authors: Berndt, Donald J., McCart, James A., Finch, Dezon K., Luther, Stephen L.
Format: Article
Language:English
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Text analytic methods are often aimed at extracting useful information from the vast array of unstructured, free format text documents that are created by almost all organizational processes. The success of any text mining application rests on the quality of the underlying data being analyzed, including both predictive features and outcome labels. In this case study, some focused experiments regarding data quality are used to assess the robustness of Statistical Text Mining (STM) algorithms when applied to clinical progress notes. In particular, the experiments consider the impacts of task complexity (by removing signals), training set size, and target outcome quality. While this research is conducted using a dataset drawn from the medical domain, the data quality issues explored are of more general interest.
ISSN:2158-656X
2158-6578
DOI:10.1145/2669368