Loading…

Imputation of missing values in DNA microarray gene expression data

Most multivariate statistical methods for gene expression data require a complete matrix of gene array values. In this paper, an imputation method based on least squares formulation is proposed to estimate missing values. It exploits local similarity structures in the data as well as least squares o...

Full description

Saved in:
Bibliographic Details
Main Authors: Hyunsoo Kim, Golub, G.H., Haesun Park
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Most multivariate statistical methods for gene expression data require a complete matrix of gene array values. In this paper, an imputation method based on least squares formulation is proposed to estimate missing values. It exploits local similarity structures in the data as well as least squares optimization process. The proposed local least squares imputation method (LLSimpute) represents a target gene that has missing values as a linear combination of similar genes. This algorithm showed better performance than the other imputation methods such as k-nearest neighbor imputation and an imputation method based on Bayesian principal component analysis.
DOI:10.1109/CSB.2004.1332500