Loading…

Novel criteria for elimination of the outliers in QSPR studies, when the ‘forward stepwise’ procedure is used

The characteristics of the proposed algorithm are (a) the use of a new formula for the quality of the QSPRs (b) the outlier (atypical) character is defined using a classic criterion (c) the condition for elimination of the outliers includes the quality of the equation (d) only ‘the most atypical’ mo...

Full description

Saved in:
Bibliographic Details
Published in:Journal of mathematical chemistry 2019-08, Vol.57 (7), p.1770-1796
Main Author: Tarko, Laszlo
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The characteristics of the proposed algorithm are (a) the use of a new formula for the quality of the QSPRs (b) the outlier (atypical) character is defined using a classic criterion (c) the condition for elimination of the outliers includes the quality of the equation (d) only ‘the most atypical’ molecule is eliminated and all calculations are automatically repeated (e) the elimination of outliers is stopped if the condition for elimination is not fulfilled or if the number of the eliminated molecules exceeds a predetermined limit. The second situation in (e) was encountered once in the four examples discussed. The number of descriptors in ‘the best’ equation and the number of outliers removed can not be a priori predicted. The text proposes also a criterion for the identification of ‘outliers for lead hopping’. There were no molecules of this type in the four examples discussed. The initial number of molecules in the calibration sets was 50, 60, 133 and 54 respectively, the number of descriptors in ‘the best’ equations was 5, 5, 9, and 9 respectively and the number of eliminated outliers was 0, 0, 8, and 6 respectively. If there were outliers, the best equation obtained in the presence of the outliers and the best equation obtained in the absence of outliers, were very different.
ISSN:0259-9791
1572-8897
DOI:10.1007/s10910-019-01036-x