Loading…

Deep transfer learning for predicting frontier orbital energies of organic materials using small data and its application to porphyrin photocatalysts

Machine learning (ML) models have received increasing attention as a new approach for the virtual screening of organic materials. Although some ML models trained on large databases have achieved high prediction accuracy, the application of ML to certain types of organic materials is limited by the s...

Full description

Saved in:
Bibliographic Details
Published in:Physical chemistry chemical physics : PCCP 2023-04, Vol.25 (15), p.1536-1549
Main Authors: Su, An, Zhang, Xin, Zhang, Chengwei, Ding, Debo, Yang, Yun-Fang, Wang, Keke, She, Yuan-Bin
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Machine learning (ML) models have received increasing attention as a new approach for the virtual screening of organic materials. Although some ML models trained on large databases have achieved high prediction accuracy, the application of ML to certain types of organic materials is limited by the small amount of available data. On the other hand, metalloporphyrins and porphyrins (MpPs) have received increasing attention as potential photocatalysts, and recent studies have found that both HOMO/LUMO energy levels and energy gaps are important factors controlling the MpP photocatalysts. Since the training data of MpPs are insufficient and limited to porphyrin-based dyes, in this study, we proposed a deep transfer learning approach to rapidly predict the HOMO/LUMO energy levels and energy gaps of MpPs. To complement the open-source Porphyrin-based Dyes Database (PBDD), we curated a new database, the Metalloporphyrins and Porphyrins Database (MpPD), in which MpPs were specifically designed as potential photocatalysts and the HOMO/LUMO energies were calculated by advanced DFT functionals. We proposed PorphyBERT, a BERT-based regression model that was pre-trained with PBDD and fine-tuned with MpPD. The model performed satisfactorily in predicting HOMO and LUMO energies and energy gap with RMSEs of 0.0955, 0.0988, and 0.0787 eV and MAEs of 0.0774, 0.0824, and 0.0549 eV. Furthermore, due to its unique unsupervised pre-training phase, the model is not affected by the difference in computational functionals between pre-training and fine-tuning databases. Finally, we recommended 12 MpPs as potential photocatalysts for CO 2 reduction with out-of-sample model predictions of energy gaps close to the values calculated by DFT. A deep transfer learning approach is used to predict HOMO/LUMO energies of organic materials with a small amount of training data.
ISSN:1463-9076
1463-9084
DOI:10.1039/d3cp00917c