Loading…
A combined Fisher and Laplacian score for feature selection in QSAR based drug design using compounds with known and unknown activities
Quantitative structure–activity relationship (QSAR) is an effective computational technique for drug design that relates the chemical structures of compounds to their biological activities. Feature selection is an important step in QSAR based drug design to select the most relevant descriptors. One...
Saved in:
Published in: | Journal of computer-aided molecular design 2018-02, Vol.32 (2), p.375-384 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Quantitative structure–activity relationship (QSAR) is an effective computational technique for drug design that relates the chemical structures of compounds to their biological activities. Feature selection is an important step in QSAR based drug design to select the most relevant descriptors. One of the most popular feature selection methods for classification problems is Fisher score which aim is to minimize the within-class distance and maximize the between-class distance. In this study, the properties of Fisher criterion were extended for QSAR models to define the new distance metrics based on the continuous activity values of compounds with known activities. Then, a semi-supervised feature selection method was proposed based on the combination of Fisher and Laplacian criteria which exploits both compounds with known and unknown activities to select the relevant descriptors. To demonstrate the efficiency of the proposed semi-supervised feature selection method in selecting the relevant descriptors, we applied the method and other feature selection methods on three QSAR data sets such as serine/threonine–protein kinase PLK3 inhibitors, ROCK inhibitors and phenol compounds. The results demonstrated that the QSAR models built on the selected descriptors by the proposed semi-supervised method have better performance than other models. This indicates the efficiency of the proposed method in selecting the relevant descriptors using the compounds with known and unknown activities. The results of this study showed that the compounds with known and unknown activities can be helpful to improve the performance of the combined Fisher and Laplacian based feature selection methods. |
---|---|
ISSN: | 0920-654X 1573-4951 |
DOI: | 10.1007/s10822-017-0094-6 |