Loading…

Investigation of Machine Intelligence in Compound Cell Activity Classification

Machine intelligence has been greatly developed in the past decades and has been widely used in many fields. In the recent years, many reports have shown its satisfactory effect in drug discovery. In this study, machine intelligence methods were explored to assist the cell activity prediction. Multi...

Full description

Saved in:
Bibliographic Details
Published in:Molecular pharmaceutics 2019-11, Vol.16 (11), p.4472-4484
Main Authors: Fan, Yuanrong, Zhang, Yanmin, Hua, Yi, Wang, Yuchen, Zhu, Lu, Zhao, Junnan, Yang, Yan, Chen, Xingye, Lu, Shuai, Lu, Tao, Chen, Yadong, Liu, Haichun
Format: Article
Language:English
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Machine intelligence has been greatly developed in the past decades and has been widely used in many fields. In the recent years, many reports have shown its satisfactory effect in drug discovery. In this study, machine intelligence methods were explored to assist the cell activity prediction. Multiple machine intelligence methods including support vector machine, decision tree, random forest, extra trees, gradient boosting machine, convolutional neural network, long short-term memory network, and gated recurrent unit network were employed to separate compounds based on their cell activity. Different from some reported classification models, compounds were expressed as a string by the simplified molecular input line entry system and directly used as input rather than any chemical descriptors, which mimicked natural language processing. Both the single cell strain and whole data set under the balanced and imbalanced data distributions were discussed, respectively. Different activity cutoffs were set for the single (Z-score = 3) and the whole (Z-score = 5 and 6) data set. Nine metrics were used to evaluate the models including accuracy, precision, recall, f 1-score, area under the receiver operating characteristic curve score, Cohen’s κ, Brier score, Matthews correlation coefficient, and balanced accuracy. The results show that the gradient boosting machine is competent at balanced data distribution, and convolutional neural network is qualified for the imbalanced one. The results demonstrate that both classic machine learning methods and deep learning methods have potential in classification of compound cell activity.
ISSN:1543-8384
1543-8392
DOI:10.1021/acs.molpharmaceut.9b00558