Loading…

An Explainable Deep Learning Classifier of Bovine Mastitis Based on Whole-Genome Sequence Data-Circumventing the p >> n Problem

The serious drawback underlying the biological annotation of whole-genome sequence data is the p >> n problem, which means that the number of polymorphic variants (p) is much larger than the number of available phenotypic records (n). We propose a way to circumvent the problem by combining a L...

Full description

Saved in:
Bibliographic Details
Published in:International journal of molecular sciences 2024-05, Vol.25 (9), p.4715
Main Authors: Kotlarz, Krzysztof, Mielczarek, Magda, Biecek, Przemysław, Wojdak-Maksymiec, Katarzyna, Suchocki, Tomasz, Topolski, Piotr, Jagusiak, Wojciech, Szyda, Joanna
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The serious drawback underlying the biological annotation of whole-genome sequence data is the p >> n problem, which means that the number of polymorphic variants (p) is much larger than the number of available phenotypic records (n). We propose a way to circumvent the problem by combining a LASSO logistic regression with deep learning to classify cows as susceptible or resistant to mastitis, based on single nucleotide polymorphism (SNP) genotypes. Among several architectures, the one with 204,642 SNPs was selected as the best. This architecture was composed of two layers with, respectively, 7 and 46 units per layer implementing respective drop-out rates of 0.210 and 0.358. The classification of the test data resulted in AUC = 0.750, accuracy = 0.650, sensitivity = 0.600, and specificity = 0.700. Significant SNPs were selected based on the SHapley Additive exPlanation (SHAP). As a final result, one GO term related to the biological process and thirteen GO terms related to molecular function were significantly enriched in the gene set that corresponded to the significant SNPs. Our findings revealed that the optimal approach can correctly predict susceptibility or resistance status for approximately 65% of cows. Genes marked by the most significant SNPs are related to the immune response and protein synthesis.
ISSN:1422-0067
1661-6596
1422-0067
DOI:10.3390/ijms25094715