Loading…
Unsupervised ensemble learning for genome sequencing
•The variant calling step in next generation sequencing technologies is formulated as a classification problem.•An unsupervised ensemble classification method is proposed as a variant caller for DNA sequencing.•An EM-based variant calling algorithm that estimates the maximum a posteriori class to ta...
Saved in:
Published in: | Pattern recognition 2022-09, Vol.129, p.108721, Article 108721 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •The variant calling step in next generation sequencing technologies is formulated as a classification problem.•An unsupervised ensemble classification method is proposed as a variant caller for DNA sequencing.•An EM-based variant calling algorithm that estimates the maximum a posteriori class to take a decision is presented.•The number of classes to be decided is greater than the number of different labels that are observed.•Experimental results with real human DNA sequencing data support the approach.
Unsupervised ensemble learning refers to methods devised for a particular task that combine data provided by decision learners taking into account their reliability, which is usually inferred from the data. Here, the variant calling step of the next generation sequencing technologies is formulated as an unsupervised ensemble classification problem. A variant calling algorithm based on the expectation-maximization algorithm is further proposed that estimates the maximum-a-posteriori decision among a number of classes larger than the number of different labels provided by the learners. Experimental results with real human DNA sequencing data show that the proposed algorithm is competitive compared to state-of-the-art variant callers as GATK, HTSLIB, and Platypus. |
---|---|
ISSN: | 0031-3203 1873-5142 |
DOI: | 10.1016/j.patcog.2022.108721 |