Loading…

nablaDFT: Large-Scale Conformational Energy and Hamiltonian Prediction benchmark and dataset

Electronic wave function calculation is a fundamental task of computational quantum chemistry. Knowledge of the wave function parameters allows one to compute physical and chemical properties of molecules and materials. Unfortunately, it is infeasible to compute the wave functions analytically even...

Full description

Saved in:
Bibliographic Details
Published in:Physical chemistry chemical physics : PCCP 2022-11, Vol.24 (42), p.25853-25863
Main Authors: Khrabrov, Kuzma, Shenbin, Ilya, Ryabov, Alexander, Tsypin, Artem, Telepov, Alexander, Alekseev, Anton, Grishin, Alexander, Strashnov, Pavel, Zhilyaev, Petr, Nikolenko, Sergey, Kadurin, Artur
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c314t-939b255a86eb3692a5ddbd940bb47c19d63a46b5130bd9909d2230820ae2ac663
cites cdi_FETCH-LOGICAL-c314t-939b255a86eb3692a5ddbd940bb47c19d63a46b5130bd9909d2230820ae2ac663
container_end_page 25863
container_issue 42
container_start_page 25853
container_title Physical chemistry chemical physics : PCCP
container_volume 24
creator Khrabrov, Kuzma
Shenbin, Ilya
Ryabov, Alexander
Tsypin, Artem
Telepov, Alexander
Alekseev, Anton
Grishin, Alexander
Strashnov, Pavel
Zhilyaev, Petr
Nikolenko, Sergey
Kadurin, Artur
description Electronic wave function calculation is a fundamental task of computational quantum chemistry. Knowledge of the wave function parameters allows one to compute physical and chemical properties of molecules and materials. Unfortunately, it is infeasible to compute the wave functions analytically even for simple molecules. Classical quantum chemistry approaches such as the Hartree-Fock method or density functional theory (DFT) allow to compute an approximation of the wave function but are very computationally expensive. One way to lower the computational complexity is to use machine learning models that can provide sufficiently good approximations at a much lower computational cost. In this work we: (1) introduce a new curated large-scale dataset of electron structures of drug-like molecules, (2) establish a novel benchmark for the estimation of molecular properties in the multi-molecule setting, and (3) evaluate a wide range of methods with this benchmark. We show that the accuracy of recently developed machine learning models deteriorates significantly when switching from the single-molecule to the multi-molecule setting. We also show that these models lack generalization over different chemistry classes. In addition, we provide experimental evidence that larger datasets lead to better ML models in the field of quantum chemistry. In this work we present nablaDFT, the new dataset and benchmark for the Density Functional Theory Hamiltonian and energy prediction. We provide data for over 1 million different molecules and over 5 million conformations and baseline models for both tasks.
doi_str_mv 10.1039/d2cp03966d
format article
fullrecord <record><control><sourceid>proquest_rsc_p</sourceid><recordid>TN_cdi_rsc_primary_d2cp03966d</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2730997479</sourcerecordid><originalsourceid>FETCH-LOGICAL-c314t-939b255a86eb3692a5ddbd940bb47c19d63a46b5130bd9909d2230820ae2ac663</originalsourceid><addsrcrecordid>eNpd0UtLw0AQAOAgCtbqxbsQ8CJCdF_ZdL1J2lqhYMF6E8LsozU12a276aH_3q2VCp5mGD6GeSTJJUZ3GFFxr4lax8i5Pkp6mHGaCTRgx4e84KfJWQgrhBDOMe0l7xZkA8Px_CGdgl-a7FVBY9LS2YXzLXS1s9CkI2v8cpuC1ekE2rrpnK3BpjNvdK12JpXGqo8W_OcP0tBBMN15crKAJpiL39hP3sajeTnJpi9Pz-XjNFMUsy4TVEiS5zDgRlIuCORaSy0YkpIVCgvNKTAu47wolgUSmhCKBgSBIaA4p_3kZt937d3XxoSuauugTNOANW4TKlKQAWYFYTjS63905TY-7rhTFAlRsEJEdbtXyrsQvFlUa1_H7bYVRtXu0NWQlLOfQw8jvtpjH9TB_T2CfgNVz3kZ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2730997479</pqid></control><display><type>article</type><title>nablaDFT: Large-Scale Conformational Energy and Hamiltonian Prediction benchmark and dataset</title><source>Royal Society of Chemistry:Jisc Collections:Royal Society of Chemistry Read and Publish 2022-2024 (reading list)</source><creator>Khrabrov, Kuzma ; Shenbin, Ilya ; Ryabov, Alexander ; Tsypin, Artem ; Telepov, Alexander ; Alekseev, Anton ; Grishin, Alexander ; Strashnov, Pavel ; Zhilyaev, Petr ; Nikolenko, Sergey ; Kadurin, Artur</creator><creatorcontrib>Khrabrov, Kuzma ; Shenbin, Ilya ; Ryabov, Alexander ; Tsypin, Artem ; Telepov, Alexander ; Alekseev, Anton ; Grishin, Alexander ; Strashnov, Pavel ; Zhilyaev, Petr ; Nikolenko, Sergey ; Kadurin, Artur</creatorcontrib><description>Electronic wave function calculation is a fundamental task of computational quantum chemistry. Knowledge of the wave function parameters allows one to compute physical and chemical properties of molecules and materials. Unfortunately, it is infeasible to compute the wave functions analytically even for simple molecules. Classical quantum chemistry approaches such as the Hartree-Fock method or density functional theory (DFT) allow to compute an approximation of the wave function but are very computationally expensive. One way to lower the computational complexity is to use machine learning models that can provide sufficiently good approximations at a much lower computational cost. In this work we: (1) introduce a new curated large-scale dataset of electron structures of drug-like molecules, (2) establish a novel benchmark for the estimation of molecular properties in the multi-molecule setting, and (3) evaluate a wide range of methods with this benchmark. We show that the accuracy of recently developed machine learning models deteriorates significantly when switching from the single-molecule to the multi-molecule setting. We also show that these models lack generalization over different chemistry classes. In addition, we provide experimental evidence that larger datasets lead to better ML models in the field of quantum chemistry. In this work we present nablaDFT, the new dataset and benchmark for the Density Functional Theory Hamiltonian and energy prediction. We provide data for over 1 million different molecules and over 5 million conformations and baseline models for both tasks.</description><identifier>ISSN: 1463-9076</identifier><identifier>EISSN: 1463-9084</identifier><identifier>DOI: 10.1039/d2cp03966d</identifier><language>eng</language><publisher>Cambridge: Royal Society of Chemistry</publisher><subject>Approximation ; Benchmarks ; Chemical properties ; Chemistry ; Computing costs ; Datasets ; Density functional theory ; Machine learning ; Quantum chemistry ; Wave functions</subject><ispartof>Physical chemistry chemical physics : PCCP, 2022-11, Vol.24 (42), p.25853-25863</ispartof><rights>Copyright Royal Society of Chemistry 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c314t-939b255a86eb3692a5ddbd940bb47c19d63a46b5130bd9909d2230820ae2ac663</citedby><cites>FETCH-LOGICAL-c314t-939b255a86eb3692a5ddbd940bb47c19d63a46b5130bd9909d2230820ae2ac663</cites><orcidid>0000-0001-6953-8317 ; 0000-0002-0754-759X ; 0000-0002-6778-225X ; 0000-0002-7280-1531 ; 0000-0001-6456-3329 ; 0000-0001-5001-146X ; 0000-0001-9662-6128 ; 0000-0001-7787-2251 ; 0000-0003-1482-9365 ; 0000-0002-0446-6751</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Khrabrov, Kuzma</creatorcontrib><creatorcontrib>Shenbin, Ilya</creatorcontrib><creatorcontrib>Ryabov, Alexander</creatorcontrib><creatorcontrib>Tsypin, Artem</creatorcontrib><creatorcontrib>Telepov, Alexander</creatorcontrib><creatorcontrib>Alekseev, Anton</creatorcontrib><creatorcontrib>Grishin, Alexander</creatorcontrib><creatorcontrib>Strashnov, Pavel</creatorcontrib><creatorcontrib>Zhilyaev, Petr</creatorcontrib><creatorcontrib>Nikolenko, Sergey</creatorcontrib><creatorcontrib>Kadurin, Artur</creatorcontrib><title>nablaDFT: Large-Scale Conformational Energy and Hamiltonian Prediction benchmark and dataset</title><title>Physical chemistry chemical physics : PCCP</title><description>Electronic wave function calculation is a fundamental task of computational quantum chemistry. Knowledge of the wave function parameters allows one to compute physical and chemical properties of molecules and materials. Unfortunately, it is infeasible to compute the wave functions analytically even for simple molecules. Classical quantum chemistry approaches such as the Hartree-Fock method or density functional theory (DFT) allow to compute an approximation of the wave function but are very computationally expensive. One way to lower the computational complexity is to use machine learning models that can provide sufficiently good approximations at a much lower computational cost. In this work we: (1) introduce a new curated large-scale dataset of electron structures of drug-like molecules, (2) establish a novel benchmark for the estimation of molecular properties in the multi-molecule setting, and (3) evaluate a wide range of methods with this benchmark. We show that the accuracy of recently developed machine learning models deteriorates significantly when switching from the single-molecule to the multi-molecule setting. We also show that these models lack generalization over different chemistry classes. In addition, we provide experimental evidence that larger datasets lead to better ML models in the field of quantum chemistry. In this work we present nablaDFT, the new dataset and benchmark for the Density Functional Theory Hamiltonian and energy prediction. We provide data for over 1 million different molecules and over 5 million conformations and baseline models for both tasks.</description><subject>Approximation</subject><subject>Benchmarks</subject><subject>Chemical properties</subject><subject>Chemistry</subject><subject>Computing costs</subject><subject>Datasets</subject><subject>Density functional theory</subject><subject>Machine learning</subject><subject>Quantum chemistry</subject><subject>Wave functions</subject><issn>1463-9076</issn><issn>1463-9084</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNpd0UtLw0AQAOAgCtbqxbsQ8CJCdF_ZdL1J2lqhYMF6E8LsozU12a276aH_3q2VCp5mGD6GeSTJJUZ3GFFxr4lax8i5Pkp6mHGaCTRgx4e84KfJWQgrhBDOMe0l7xZkA8Px_CGdgl-a7FVBY9LS2YXzLXS1s9CkI2v8cpuC1ekE2rrpnK3BpjNvdK12JpXGqo8W_OcP0tBBMN15crKAJpiL39hP3sajeTnJpi9Pz-XjNFMUsy4TVEiS5zDgRlIuCORaSy0YkpIVCgvNKTAu47wolgUSmhCKBgSBIaA4p_3kZt937d3XxoSuauugTNOANW4TKlKQAWYFYTjS63905TY-7rhTFAlRsEJEdbtXyrsQvFlUa1_H7bYVRtXu0NWQlLOfQw8jvtpjH9TB_T2CfgNVz3kZ</recordid><startdate>20221102</startdate><enddate>20221102</enddate><creator>Khrabrov, Kuzma</creator><creator>Shenbin, Ilya</creator><creator>Ryabov, Alexander</creator><creator>Tsypin, Artem</creator><creator>Telepov, Alexander</creator><creator>Alekseev, Anton</creator><creator>Grishin, Alexander</creator><creator>Strashnov, Pavel</creator><creator>Zhilyaev, Petr</creator><creator>Nikolenko, Sergey</creator><creator>Kadurin, Artur</creator><general>Royal Society of Chemistry</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SR</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>L7M</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-6953-8317</orcidid><orcidid>https://orcid.org/0000-0002-0754-759X</orcidid><orcidid>https://orcid.org/0000-0002-6778-225X</orcidid><orcidid>https://orcid.org/0000-0002-7280-1531</orcidid><orcidid>https://orcid.org/0000-0001-6456-3329</orcidid><orcidid>https://orcid.org/0000-0001-5001-146X</orcidid><orcidid>https://orcid.org/0000-0001-9662-6128</orcidid><orcidid>https://orcid.org/0000-0001-7787-2251</orcidid><orcidid>https://orcid.org/0000-0003-1482-9365</orcidid><orcidid>https://orcid.org/0000-0002-0446-6751</orcidid></search><sort><creationdate>20221102</creationdate><title>nablaDFT: Large-Scale Conformational Energy and Hamiltonian Prediction benchmark and dataset</title><author>Khrabrov, Kuzma ; Shenbin, Ilya ; Ryabov, Alexander ; Tsypin, Artem ; Telepov, Alexander ; Alekseev, Anton ; Grishin, Alexander ; Strashnov, Pavel ; Zhilyaev, Petr ; Nikolenko, Sergey ; Kadurin, Artur</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c314t-939b255a86eb3692a5ddbd940bb47c19d63a46b5130bd9909d2230820ae2ac663</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Approximation</topic><topic>Benchmarks</topic><topic>Chemical properties</topic><topic>Chemistry</topic><topic>Computing costs</topic><topic>Datasets</topic><topic>Density functional theory</topic><topic>Machine learning</topic><topic>Quantum chemistry</topic><topic>Wave functions</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Khrabrov, Kuzma</creatorcontrib><creatorcontrib>Shenbin, Ilya</creatorcontrib><creatorcontrib>Ryabov, Alexander</creatorcontrib><creatorcontrib>Tsypin, Artem</creatorcontrib><creatorcontrib>Telepov, Alexander</creatorcontrib><creatorcontrib>Alekseev, Anton</creatorcontrib><creatorcontrib>Grishin, Alexander</creatorcontrib><creatorcontrib>Strashnov, Pavel</creatorcontrib><creatorcontrib>Zhilyaev, Petr</creatorcontrib><creatorcontrib>Nikolenko, Sergey</creatorcontrib><creatorcontrib>Kadurin, Artur</creatorcontrib><collection>CrossRef</collection><collection>Engineered Materials Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>MEDLINE - Academic</collection><jtitle>Physical chemistry chemical physics : PCCP</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Khrabrov, Kuzma</au><au>Shenbin, Ilya</au><au>Ryabov, Alexander</au><au>Tsypin, Artem</au><au>Telepov, Alexander</au><au>Alekseev, Anton</au><au>Grishin, Alexander</au><au>Strashnov, Pavel</au><au>Zhilyaev, Petr</au><au>Nikolenko, Sergey</au><au>Kadurin, Artur</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>nablaDFT: Large-Scale Conformational Energy and Hamiltonian Prediction benchmark and dataset</atitle><jtitle>Physical chemistry chemical physics : PCCP</jtitle><date>2022-11-02</date><risdate>2022</risdate><volume>24</volume><issue>42</issue><spage>25853</spage><epage>25863</epage><pages>25853-25863</pages><issn>1463-9076</issn><eissn>1463-9084</eissn><abstract>Electronic wave function calculation is a fundamental task of computational quantum chemistry. Knowledge of the wave function parameters allows one to compute physical and chemical properties of molecules and materials. Unfortunately, it is infeasible to compute the wave functions analytically even for simple molecules. Classical quantum chemistry approaches such as the Hartree-Fock method or density functional theory (DFT) allow to compute an approximation of the wave function but are very computationally expensive. One way to lower the computational complexity is to use machine learning models that can provide sufficiently good approximations at a much lower computational cost. In this work we: (1) introduce a new curated large-scale dataset of electron structures of drug-like molecules, (2) establish a novel benchmark for the estimation of molecular properties in the multi-molecule setting, and (3) evaluate a wide range of methods with this benchmark. We show that the accuracy of recently developed machine learning models deteriorates significantly when switching from the single-molecule to the multi-molecule setting. We also show that these models lack generalization over different chemistry classes. In addition, we provide experimental evidence that larger datasets lead to better ML models in the field of quantum chemistry. In this work we present nablaDFT, the new dataset and benchmark for the Density Functional Theory Hamiltonian and energy prediction. We provide data for over 1 million different molecules and over 5 million conformations and baseline models for both tasks.</abstract><cop>Cambridge</cop><pub>Royal Society of Chemistry</pub><doi>10.1039/d2cp03966d</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0001-6953-8317</orcidid><orcidid>https://orcid.org/0000-0002-0754-759X</orcidid><orcidid>https://orcid.org/0000-0002-6778-225X</orcidid><orcidid>https://orcid.org/0000-0002-7280-1531</orcidid><orcidid>https://orcid.org/0000-0001-6456-3329</orcidid><orcidid>https://orcid.org/0000-0001-5001-146X</orcidid><orcidid>https://orcid.org/0000-0001-9662-6128</orcidid><orcidid>https://orcid.org/0000-0001-7787-2251</orcidid><orcidid>https://orcid.org/0000-0003-1482-9365</orcidid><orcidid>https://orcid.org/0000-0002-0446-6751</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1463-9076
ispartof Physical chemistry chemical physics : PCCP, 2022-11, Vol.24 (42), p.25853-25863
issn 1463-9076
1463-9084
language eng
recordid cdi_rsc_primary_d2cp03966d
source Royal Society of Chemistry:Jisc Collections:Royal Society of Chemistry Read and Publish 2022-2024 (reading list)
subjects Approximation
Benchmarks
Chemical properties
Chemistry
Computing costs
Datasets
Density functional theory
Machine learning
Quantum chemistry
Wave functions
title nablaDFT: Large-Scale Conformational Energy and Hamiltonian Prediction benchmark and dataset
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T04%3A57%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_rsc_p&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=nablaDFT:%20Large-Scale%20Conformational%20Energy%20and%20Hamiltonian%20Prediction%20benchmark%20and%20dataset&rft.jtitle=Physical%20chemistry%20chemical%20physics%20:%20PCCP&rft.au=Khrabrov,%20Kuzma&rft.date=2022-11-02&rft.volume=24&rft.issue=42&rft.spage=25853&rft.epage=25863&rft.pages=25853-25863&rft.issn=1463-9076&rft.eissn=1463-9084&rft_id=info:doi/10.1039/d2cp03966d&rft_dat=%3Cproquest_rsc_p%3E2730997479%3C/proquest_rsc_p%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c314t-939b255a86eb3692a5ddbd940bb47c19d63a46b5130bd9909d2230820ae2ac663%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2730997479&rft_id=info:pmid/&rfr_iscdi=true