Loading…
K-means Clustering and Principal Components Analysis of Microarray Data of L1000 Landmark Genes
Dimensionality reduction methods such as principal component analysis (PCA) are used to select relevant features, and k-means clustering performs well when applied to data with low effective dimensionality. This study integrated PCA and k-means clustering using the L1000 dataset, containing gene mic...
Saved in:
Published in: | Procedia computer science 2020, Vol.168, p.97-104 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Dimensionality reduction methods such as principal component analysis (PCA) are used to select relevant features, and k-means clustering performs well when applied to data with low effective dimensionality. This study integrated PCA and k-means clustering using the L1000 dataset, containing gene microarray data from 978 landmark genes, which have been previously shown to predict expression of ~81% of the remaining 21,290 target genes with low error. Groups within the L1000 dataset were characterized using both microarray and clinical metadata to assess whether 978 landmark genes would improve clustering results, compared to a random set of 978 genes. The role of clinical variables, including morphological diagnosis, were assessed across k-means clustering groups within homogeneous tissue samples in the L1000 dataset. Results show that the 978 landmark genes better differentiated k-means clusters, relative to 978 randomly selected non-landmark genes. K-means clusters generated from the landmark genes showed more separation of cluster groups when plotted against the first two principal components, which capture a greater proportion of variation for the 978 landmark genes. These results suggest that the 978 landmark genes better represent the overall genetic profile of these heterogeneous samples. Future studies will implement predictive analytics techniques to further investigate the interaction of microarray data and clinical variables such as cancer stage. |
---|---|
ISSN: | 1877-0509 1877-0509 |
DOI: | 10.1016/j.procs.2020.02.265 |