Loading…
Finding optimum width of discretization for gene expressions using functional annotations
Discretizing gene expression values is an important step in data preprocessing as it helps in reducing noise and experimental errors. This in turn provides better results in various tasks such as gene regulatory network analysis and disease prediction. A supervised discretization method for gene exp...
Saved in:
Published in: | Computers in biology and medicine 2017-11, Vol.90, p.59-67 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Discretizing gene expression values is an important step in data preprocessing as it helps in reducing noise and experimental errors. This in turn provides better results in various tasks such as gene regulatory network analysis and disease prediction. A supervised discretization method for gene expressions using gene annotation is developed. The method is called “Gene Annotation Based Discretization” (GABD) where the discretization width is determined by maximizing the positive predictive value (PPV), computed using gene annotations, for top 20,000 gene pairs. The method can capture the gene similarity better than those obtained using original expressions. The performance of GABD is compared with some existing discretization methods like equal width discretization, equal frequency discretization and k-means discretization in terms of positive predictive value (PPV). The utility of GABD is also shown by clustering genes using k-medoid algorithm and thereby predicting the function of 23 unclassified Saccharomyces cerevisiae genes using p-value cut off 10−10. The source code for GABD is available at http://www.sampa.droppages.com/GABD.html.
•A method (GABD) is developed where annotations of genes are used to find the optimum width of discretization.•Pearson correlation is used to compute similarity between expressions obtained using GABD.•The optimum width is determined by maximizing the PPV of gene pairs having higher expression similarity.•Functions of 23 unclassified Saccharomyces Cerevisiae genes are predicted. |
---|---|
ISSN: | 0010-4825 1879-0534 |
DOI: | 10.1016/j.compbiomed.2017.09.010 |