Loading…
MLSeq: Machine learning interface for RNA-sequencing data
•Our aim was to develop an analysis tool for classification of RNA-sequencing data.•We used R programming language and developed MLSeq package. It is distributed through Bioconductor network of R software. MLSeq is free and open-source.•MLSeq is the most comprehensive R package in Bioconductor netwo...
Saved in:
Published in: | Computer methods and programs in biomedicine 2019-07, Vol.175, p.223-231 |
---|---|
Main Authors: | , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •Our aim was to develop an analysis tool for classification of RNA-sequencing data.•We used R programming language and developed MLSeq package. It is distributed through Bioconductor network of R software. MLSeq is free and open-source.•MLSeq is the most comprehensive R package in Bioconductor network including more than 80 machine learning algorithms.•MLSeq provides a pipeline to classify RNA-sequencing data including normalization, filtering and transformation steps.
In the last decade, RNA-sequencing technology has become method-of-choice and prefered to microarray technology for gene expression based classification and differential expression analysis since it produces less noisy data. Although there are many algorithms proposed for microarray data, the number of available algorithms and programs are limited for classification of RNA-sequencing data. For this reason, we developed MLSeq, to bring not only frequently used classification algorithms but also novel approaches together and make them available to be used for classification of RNA sequencing data. This package is developed using R language environment and distributed through BIOCONDUCTOR network.
Classification of RNA-sequencing data is not straightforward since raw data should be preprocessed before downstream analysis. With MLSeq package, researchers can easily preprocess (normalization, filtering, transformation etc.) and classify raw RNA-sequencing data using two strategies: (i) to perform algorithms which are directly proposed for RNA-sequencing data structure or (ii) to transform RNA-sequencing data in order to bring it distributionally closer to microarray data structure, and perform algorithms which are developed for microarray data. Moreover, we proposed novel algorithms such as voom (an acronym for variance modelling at observational level) based nearest shrunken centroids (voomNSC), diagonal linear discriminant analysis (voomDLDA), etc. through MLSeq.
Three real RNA-sequencing datasets (i.e cervical cancer, lung cancer and aging datasets) were used to evalute model performances. Poisson linear discriminant analysis (PLDA) and negative binomial linear discriminant analysis (NBLDA) were selected as algorithms based on dicrete distributions, and voomNSC, nearest shrunken centroids (NSC) and support vector machines (SVM) were selected as algorithms based on continuous distributions for model comparisons. Each algorithm is compared using classification accuracies and sparsities on an ind |
---|---|
ISSN: | 0169-2607 1872-7565 |
DOI: | 10.1016/j.cmpb.2019.04.007 |