Loading…

Improved iterative pruning principal component analysis with graph-theoretic hierarchical clustering

Various unsupervised clustering algorithms have been used to infer population structure in genetic data. The goals are to separate individuals of similar genetic characteristics into clusters and to estimate the number of clusters within each dataset. Among them, a framework called iterative pruning...

Full description

Saved in:
Bibliographic Details
Main Authors: Amornbunchornvej, C., Limpiti, T., Assawamakin, A., Intarapanich, A., Tongsima, S.
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Various unsupervised clustering algorithms have been used to infer population structure in genetic data. The goals are to separate individuals of similar genetic characteristics into clusters and to estimate the number of clusters within each dataset. Among them, a framework called iterative pruning principal component analysis (ipPCA) have been developed. It performs PCA iteratively on subsets of data samples and clusters them using fuzzy c-mean. We believe that the choice of model-based clustering method affects the individual assignments and cluster quality, as well as the estimated number of clusters. Thus, in this paper we introduce a hierarchical tree clustering concept from graph theory, whose performance is independent of cluster shapes, into the ipPCA framework. We also add a PCA-based feature selection technique as a data pre-processing step to reduce data dimension and increase computational efficiency. The resulting algorithm is called HiClust-ipPCA. We illustrate the improved clustering results of the HiClust-ipPCA algorithm using 47-breed bovine and 28-breed sheep datasets.
DOI:10.1109/ECTICon.2012.6254120