Loading…

An Empirical Comparison of EM and K-means Algorithms for Binning Metagenomics Datasets

Metagenomics is an area of microbiology that deals with the taxonomic classification of genomic samples taken directly from the environment. These samples are sequences of variable length and they may correspond to different species, some of which may be unknown or not previously stored in a genomic...

Full description

Saved in:
Bibliographic Details
Published in:Ingeniare : Revista Chilena de Ingenieria 2018-11, Vol.26 (suppl 1), p.20-27
Main Authors: Tapia Reyes, Patricio, Meneses Villegas, Claudio
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Metagenomics is an area of microbiology that deals with the taxonomic classification of genomic samples taken directly from the environment. These samples are sequences of variable length and they may correspond to different species, some of which may be unknown or not previously stored in a genomic database. One of the main steps in metagenomics classification correspond to binning the sequence fragments into groups that may correspond to one species. Many approaches are used to perform binning, mainly machine learning algorithms to perform classification or clustering. This paper presents the results of an empirical evaluation of two well-known unsupervised algorithms to perform the metagenomics binning task: the EM versus the K-means algorithms. Both algorithms are tested on short and long reads of synthetic datasets, with different proportions and number of species. These empirical results show that K-means in general outperforms the EM algorithm, but EM results competitive in several of the short reads datasets used for evaluation.
ISSN:0718-3305
0718-3291
0718-3305
DOI:10.4067/S0718-33052018000500020