Loading…
An adaptive workflow coupled with Random Forest algorithm to identify intact N-glycopeptides detected from mass spectrometry
Despite many attempts for algorithm development in recent years, automated identification of intact glycopeptides from LC-MS(2) spectral data is still a challenge in both sensitivity and precision. We implemented a supervised machine learning algorithm, Random Forest, in an automated workflow to ide...
Saved in:
Published in: | Bioinformatics (Oxford, England) England), 2014-07, Vol.30 (13), p.1908-1916 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Despite many attempts for algorithm development in recent years, automated identification of intact glycopeptides from LC-MS(2) spectral data is still a challenge in both sensitivity and precision.
We implemented a supervised machine learning algorithm, Random Forest, in an automated workflow to identify N-glycopeptides using spectral features derived from ion trap-based LC-MS(2) data. The workflow streamlined high-confident N-glycopeptide spectral data and enabled adaptive model optimization with respect to different sampling strategies, training sample size and feature set. A critical evaluation of the features important for glycopeptide identification further facilitated effective feature selection for model improvement. Using split sample testing method from 577 high-confident N-glycopeptide spectral data, we demonstrated that an optimal true-positive rate, precision and false-positive rate of 73, 88 and 10%, respectively, can be attained for overall N-glycopeptide identification Availability and implementation: The workflow developed in this work and the application suite, Sweet-Heart, that the workflow supports for N-glycopeptide identification are available for download at http://sweet-heart.glycoproteomics.proteome.bc.sinica.edu.tw/. |
---|---|
ISSN: | 1367-4803 1367-4811 |
DOI: | 10.1093/bioinformatics/btu139 |