Loading…

gEM/GANN: A multivariate computational strategy for auto‐characterizing relationships between cellular and clinical phenotypes and predicting disease progression time using high‐dimensional flow cytometry data

The dramatic increase in the complexity of flow cytometric datasets requires new computational approaches that can maximize the amount of information derived and overcome the limitations of traditional gating strategies. Herein, we present a multivariate computational analysis of the HIV‐infected fl...

Full description

Saved in:
Bibliographic Details
Published in:Cytometry. Part A 2015-07, Vol.87 (7), p.616-623
Main Authors: Tong, Dong Ling, Ball, Graham R., Pockley, A. Graham
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The dramatic increase in the complexity of flow cytometric datasets requires new computational approaches that can maximize the amount of information derived and overcome the limitations of traditional gating strategies. Herein, we present a multivariate computational analysis of the HIV‐infected flow cytometry datasets that were provided as part of the FlowCAP‐IV Challenge using unsupervised and supervised learning techniques. Out of 383 samples (stimulated and unstimulated), 191 samples were used as a training set (34 individuals whose disease did not progress, and 157 individuals whose disease did progress). Using the results from the training set, the participants in the Challenge were then asked to predict the condition and progression time of the remaining individuals (45 “nonprogressors” and 147 “progressors”). To achieve this, we first scaled down data resolution and then excluded doublet cells from the analysis using Expectation Maximization approaches. We then standardized all samples into histograms and used Genetic Algorithm‐Neural Network to extract feature sets from the datasets, the reliability of which were examined using WEKA‐implemented classifiers. The selected feature set resulted in a high sensitivity and specificity for the discrimination of progressors and nonprogressors in the training set (average True Positive Rate = 1.00 and average False Positive Rate = 0.033). The capacity of the feature set to predict real‐time survival time was better when using data from the “unstimulated” training set (r = 0.825). The P‐values and 95% confidence interval log‐rank ratios between actual and predicted survival time in the test set were 0.682 and 0.9542 ± 0.24 for the unstimulated dataset, and 0.4451 and 0.9173 ± 0.23 for the stimulated dataset. Our analytic strategy has demonstrated a promising capacity to extract useful information from complex flow cytometry datasets, despite a significance imbalance and variation between the training and test sets. © 2015 International Society for Advancement of Cytometry
ISSN:1552-4922
1552-4930
DOI:10.1002/cyto.a.22622