Loading…

First-line drug resistance profiling of Mycobacterium tuberculosis : a machine learning approach

The persistence and emergence of new multi-drug resistant Mycobacterium tuberculosis (M. tb) strains continues to advance the devastating tuberculosis (TB) epidemic. Robust systems are needed to accurately and rapidly perform drug-resistance profiling, and machine learning (ML) methods combined with...

Full description

Saved in:
Bibliographic Details
Published in:AMIA ... Annual Symposium proceedings 2021, Vol.2021, p.891-899
Main Authors: Müller, Stephanie J, Meraba, Rebone L, Dlamini, Gciniwe S, Mapiye, Darlington S
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The persistence and emergence of new multi-drug resistant Mycobacterium tuberculosis (M. tb) strains continues to advance the devastating tuberculosis (TB) epidemic. Robust systems are needed to accurately and rapidly perform drug-resistance profiling, and machine learning (ML) methods combined with genomic sequence data may provide novel insights into drug-resistance mechanisms. Using 372 M. tb isolates, the combined utility of ML and bioinformatics to perform drug-resistance profiling is demonstrated. SNPs, InDels, and dinucleotide frequencies are explored as input features for three ML models, namely Decision Trees, Random Forest, and the eXtreme Gradient Boosted model. Using SNPs and InDels, all three models performed equally well yielding a 99% accuracy, 97% recall, and 99% F1-score. Using dinucleotide frequencies, the XGBoost algorithm was superior with a 97% accuracy, 94% recall and 97% F1-score. This study validates the use of variants and presents dinucleotide features as another effective feature encoding method for ML-based phenotype classification.
ISSN:1559-4076