Loading…

An adaptive workflow coupled with Random Forest algorithm to identify intact N-glycopeptides detected from mass spectrometry

Despite many attempts for algorithm development in recent years, automated identification of intact glycopeptides from LC-MS(2) spectral data is still a challenge in both sensitivity and precision. We implemented a supervised machine learning algorithm, Random Forest, in an automated workflow to ide...

Full description

Saved in:
Bibliographic Details
Published in:Bioinformatics (Oxford, England) England), 2014-07, Vol.30 (13), p.1908-1916
Main Authors: Liang, Suh-Yuen, Wu, Sz-Wei, Pu, Tsung-Hsien, Chang, Fang-Yu, Khoo, Kay-Hooi
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Despite many attempts for algorithm development in recent years, automated identification of intact glycopeptides from LC-MS(2) spectral data is still a challenge in both sensitivity and precision. We implemented a supervised machine learning algorithm, Random Forest, in an automated workflow to identify N-glycopeptides using spectral features derived from ion trap-based LC-MS(2) data. The workflow streamlined high-confident N-glycopeptide spectral data and enabled adaptive model optimization with respect to different sampling strategies, training sample size and feature set. A critical evaluation of the features important for glycopeptide identification further facilitated effective feature selection for model improvement. Using split sample testing method from 577 high-confident N-glycopeptide spectral data, we demonstrated that an optimal true-positive rate, precision and false-positive rate of 73, 88 and 10%, respectively, can be attained for overall N-glycopeptide identification Availability and implementation: The workflow developed in this work and the application suite, Sweet-Heart, that the workflow supports for N-glycopeptide identification are available for download at http://sweet-heart.glycoproteomics.proteome.bc.sinica.edu.tw/.
ISSN:1367-4803
1367-4811
DOI:10.1093/bioinformatics/btu139