Loading…

Accurate prediction of isothermal gas chromatographic Kováts retention indices

•A public webserver RIpred 1.0 was deployed for isothermal Kováts retention index prediction.•RIpred handles underivatized and derivatized compounds in three stationary GC phases.•Similar performance was achieved when comparing with the commercial NIST predictor.•The mean absolute percentage error (...

Full description

Saved in:
Bibliographic Details
Published in:Journal of Chromatography A 2023-08, Vol.1705, p.464176, Article 464176
Main Authors: Anjum, Afia, Liigand, Jaanus, Milford, Ralph, Gautam, Vasuk, Wishart, David S.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•A public webserver RIpred 1.0 was deployed for isothermal Kováts retention index prediction.•RIpred handles underivatized and derivatized compounds in three stationary GC phases.•Similar performance was achieved when comparing with the commercial NIST predictor.•The mean absolute percentage error (MAPE) of the models is within 3% in most cases.•RIpred 1.0 predicted ∼5 million RI values for all GC-amenable compounds in HMDB. We describe a freely available web server called Retention Index Predictor (RIpred) (https://ripred.ca) that rapidly and accurately predicts Gas Chromatographic Kováts Retention Indices (RI) using SMILES strings as chemical structure input. RIpred performs RI prediction for three different stationary phases (semi-standard non-polar (SSNP), standard non-polar (SNP), and standard polar (SP)) for both derivatized (trimethylsilyl (TMS) and tert‑butyldimethylsilyl (TBDMS) derivatized) and underivatized (base compound) forms of GC-amenable structures. RIpred was developed to address the need for freely available, fast, highly accurate RI predictions for a wide range of derivatized and underivatized chemicals for all common GC stationary phases. RIpred was trained using a Graph Neural Network (GNN) that used compound structures, their extracted features (mostly atom-level features) and the GC-RI data from the National Institute of Standards and Technology databases (NIST 17 and NIST 20). We curated this NIST 17 and NIST 20 GC-RI data, which is available for all three stationary phases, to create appropriate inputs (molecular graphs in this case) needed to enhance our model performance. The performance of different RIpred predictive models was evaluated using 10-fold cross validation (CV). The best performing RIpred models were identified and when tested on hold-out test sets from all stationary phases, achieved a Mean Absolute Error (MAE) of
ISSN:0021-9673
DOI:10.1016/j.chroma.2023.464176