Loading…

Removing confounding factors via constraint-based clustering: An application to finding homogeneous groups of multiple sclerosis patients

Abstract Objectives Confounding factors in unsupervised data can lead to undesirable clustering results. For example in medical datasets, age is often a confounding factor in tests designed to judge the severity of a patient's disease through measures of mobility, eyesight and hearing. In such...

Full description

Saved in:
Bibliographic Details
Published in:Artificial intelligence in medicine 2015-10, Vol.65 (2), p.79-88
Main Authors: Liu, Jingjing, Brodley, Carla E, Healy, Brian C, Chitnis, Tanuja
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Objectives Confounding factors in unsupervised data can lead to undesirable clustering results. For example in medical datasets, age is often a confounding factor in tests designed to judge the severity of a patient's disease through measures of mobility, eyesight and hearing. In such cases, removing age from each instance will not remove its effect from the data as other features will be correlated with age. Motivated by the need to find homogeneous groups of multiple sclerosis (MS) patients, we apply our approach to remove physician subjectivity from patient data. Methods We present a method based on constraint-based clustering to remove the impact of such confounding factors. Given knowledge about which feature (or set of features) is a confounding factor, call it F . Our method first partitions the data into b bins: if F is categorical, instances from the same category construct one bin; if F is numeric, then we split bins such that each bin contains instances of similar F value. Thus each instance is assigned to a single bin for factor F . We then remove feature F from each instance for the remaining steps. Next, we cluster the data separately in each bin. Using these clustering results, we generate pair-wise constraints and then run a constraint-based clustering algorithm to produce a final grouping. Results In a series of experiments with synthetic datasets, we compare our proposed methods to detrending when one has numeric confounding factors. We apply our method to the Comprehensive Longitudinal Investigation of Multiple Sclerosis at Brigham and Womens Hospital dataset, and find a novel grouping of patients that can help uncover the factors that impact disease progression in MS. Conclusions Our method groups data removing the effect of confounding factors without making any assumptions about the form of the influence of these factors on the other features. We identified clusters of MS patients that have clinically recognizable differences. Because patients more likely to progress are found using this approach, our results have the potential to aid physicians in tailoring treatment decisions for MS patients.
ISSN:0933-3657
1873-2860
DOI:10.1016/j.artmed.2015.06.004