Loading…

M3S-ALG: Improved and robust prediction of allergenicity of chemical compounds by using a novel multi-step stacking strategy

A wide variety of chemicals cannot be introduced to the marketplace because of their high allergenicity. Therefore, it is fundamentally crucial to assess the allergenic potential of chemicals before introducing them into clinical therapeutics. However, assessing the allergenicity of chemical compoun...

Full description

Saved in:
Bibliographic Details
Published in:Future generation computer systems 2025-01, Vol.162, p.107455, Article 107455
Main Authors: Charoenkwan, Phasit, Schaduangrat, Nalini, Phan, Le Thi, Manavalan, Balachandran, Shoombuatong, Watshara
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:A wide variety of chemicals cannot be introduced to the marketplace because of their high allergenicity. Therefore, it is fundamentally crucial to assess the allergenic potential of chemicals before introducing them into clinical therapeutics. However, assessing the allergenicity of chemical compounds experimentally is time-consuming and costly. To tackle this challenge, we propose M3S-ALG, a novel multi-step stacking strategy (M3S) for rapid and accurate identification of the allergenicity of chemical compounds by using only the SMILES notation. The proposed M3S method involves three steps, as follows. First, ten different balanced datasets were constructed using an under-sampling approach. Second, for each balanced dataset, 144 base-classifiers were trained and optimized to generate the prediction scores of allergenic chemical compounds considered as new probabilistic features. Third, we selected the important probabilistic features and employed them to construct the final stacked model (M3S-ALG). Experimental results show that M3S-ALG outperforms conventional ensemble strategies and its constituent base-classifiers on both the training and independent test datasets. This indicates the effectiveness and robustness of our proposed strategy in identifying the allergenicity of chemical compounds. In addition, M3S-ALG exhibited excellent prediction performance compared to existing methods on the independent test dataset, achieving a balanced accuracy of 0.877, MCC of 0.712, and AUC of 0.931. Finally, we developed a user-friendly online web server at https://pmlabqsar.pythonanywhere.com/M3SALG. This new approach is anticipated to facilitate the drug discovery and development community for the large-scale identification of chemical compounds with no allergenic properties. •M3S is a novel multi-step stacking strategy for addressing the data imbalance problem.•M3S-ALG is a high-accuracy model for identifying allergenic chemical compounds.•The proposed M3S-ALG outperforms the existing methods on the independent test dataset.•A web server of M3S-ALG is available at https://pmlabqsar.pythonanywhere.com/M3SALG.
ISSN:0167-739X
DOI:10.1016/j.future.2024.07.033