Loading…

Pre-trained Gaussian Processes for Bayesian Optimization

Bayesian optimization (BO) has become a popular strategy for global optimization of expensive real-world functions. Contrary to a common expectation that BO is suited to optimizing black-box functions, it actually requires domain knowledge about those functions to deploy BO successfully. Such domain...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2024-08
Main Authors:	Wang, Zi, Dahl, George E, Swersky, Kevin, Lee, Chansoo, Nado, Zachary, Gilmer, Justin, Snoek, Jasper, Ghahramani, Zoubin
Format:	Article
Language:	English
Subjects:	Artificial neural networks Bayesian analysis Datasets Decision theory Experimentation Iterative methods Mathematical models Neural networks Optimization Parameter sensitivity Training Tuning
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Wang, Zi Dahl, George E Swersky, Kevin Lee, Chansoo Nado, Zachary Gilmer, Justin Snoek, Jasper Ghahramani, Zoubin
description	Bayesian optimization (BO) has become a popular strategy for global optimization of expensive real-world functions. Contrary to a common expectation that BO is suited to optimizing black-box functions, it actually requires domain knowledge about those functions to deploy BO successfully. Such domain knowledge often manifests in Gaussian process (GP) priors that specify initial beliefs on functions. However, even with expert knowledge, it is non-trivial to quantitatively define a prior. This is especially true for hyperparameter tuning problems on complex machine learning models, where landscapes of tuning objectives are often difficult to comprehend. We seek an alternative practice for setting these functional priors. In particular, we consider the scenario where we have data from similar functions that allow us to pre-train a tighter distribution a priori. We detail what pre-training entails for GPs using a KL divergence based loss function, and propose a new pre-training based BO framework named HyperBO. Theoretically, we show bounded posterior predictions and near-zero regrets for HyperBO without assuming the "ground truth" GP prior is known. To verify our approach in realistic setups, we collect a large multi-task hyperparameter tuning dataset by training tens of thousands of configurations of near-state-of-the-art deep learning models on popular image and text datasets, as well as a protein sequence dataset. Our results show that on average, HyperBO is able to locate good hyperparameters at least 3 times more efficiently than the best competing methods on both our new tuning dataset and existing multi-task BO benchmarks.
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2574547655</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2574547655</sourcerecordid><originalsourceid>FETCH-proquest_journals_25745476553</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSwCChK1S0pSszMS01RcE8sLS7OTMxTCCjKT04tLk4tVkjLL1JwSqxMBQv7F5Rk5mZWJZZk5ufxMLCmJeYUp_JCaW4GZTfXEGcP3YKi_MLS1OKS-Kz80qI8oFS8kam5iamJuRnQQuJUAQBd5zZo</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2574547655</pqid></control><display><type>article</type><title>Pre-trained Gaussian Processes for Bayesian Optimization</title><source>Publicly Available Content Database</source><creator>Wang, Zi ; Dahl, George E ; Swersky, Kevin ; Lee, Chansoo ; Nado, Zachary ; Gilmer, Justin ; Snoek, Jasper ; Ghahramani, Zoubin</creator><creatorcontrib>Wang, Zi ; Dahl, George E ; Swersky, Kevin ; Lee, Chansoo ; Nado, Zachary ; Gilmer, Justin ; Snoek, Jasper ; Ghahramani, Zoubin</creatorcontrib><description>Bayesian optimization (BO) has become a popular strategy for global optimization of expensive real-world functions. Contrary to a common expectation that BO is suited to optimizing black-box functions, it actually requires domain knowledge about those functions to deploy BO successfully. Such domain knowledge often manifests in Gaussian process (GP) priors that specify initial beliefs on functions. However, even with expert knowledge, it is non-trivial to quantitatively define a prior. This is especially true for hyperparameter tuning problems on complex machine learning models, where landscapes of tuning objectives are often difficult to comprehend. We seek an alternative practice for setting these functional priors. In particular, we consider the scenario where we have data from similar functions that allow us to pre-train a tighter distribution a priori. We detail what pre-training entails for GPs using a KL divergence based loss function, and propose a new pre-training based BO framework named HyperBO. Theoretically, we show bounded posterior predictions and near-zero regrets for HyperBO without assuming the "ground truth" GP prior is known. To verify our approach in realistic setups, we collect a large multi-task hyperparameter tuning dataset by training tens of thousands of configurations of near-state-of-the-art deep learning models on popular image and text datasets, as well as a protein sequence dataset. Our results show that on average, HyperBO is able to locate good hyperparameters at least 3 times more efficiently than the best competing methods on both our new tuning dataset and existing multi-task BO benchmarks.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Artificial neural networks ; Bayesian analysis ; Datasets ; Decision theory ; Experimentation ; Iterative methods ; Mathematical models ; Neural networks ; Optimization ; Parameter sensitivity ; Training ; Tuning</subject><ispartof>arXiv.org, 2024-08</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2574547655?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Wang, Zi</creatorcontrib><creatorcontrib>Dahl, George E</creatorcontrib><creatorcontrib>Swersky, Kevin</creatorcontrib><creatorcontrib>Lee, Chansoo</creatorcontrib><creatorcontrib>Nado, Zachary</creatorcontrib><creatorcontrib>Gilmer, Justin</creatorcontrib><creatorcontrib>Snoek, Jasper</creatorcontrib><creatorcontrib>Ghahramani, Zoubin</creatorcontrib><title>Pre-trained Gaussian Processes for Bayesian Optimization</title><title>arXiv.org</title><description>Bayesian optimization (BO) has become a popular strategy for global optimization of expensive real-world functions. Contrary to a common expectation that BO is suited to optimizing black-box functions, it actually requires domain knowledge about those functions to deploy BO successfully. Such domain knowledge often manifests in Gaussian process (GP) priors that specify initial beliefs on functions. However, even with expert knowledge, it is non-trivial to quantitatively define a prior. This is especially true for hyperparameter tuning problems on complex machine learning models, where landscapes of tuning objectives are often difficult to comprehend. We seek an alternative practice for setting these functional priors. In particular, we consider the scenario where we have data from similar functions that allow us to pre-train a tighter distribution a priori. We detail what pre-training entails for GPs using a KL divergence based loss function, and propose a new pre-training based BO framework named HyperBO. Theoretically, we show bounded posterior predictions and near-zero regrets for HyperBO without assuming the "ground truth" GP prior is known. To verify our approach in realistic setups, we collect a large multi-task hyperparameter tuning dataset by training tens of thousands of configurations of near-state-of-the-art deep learning models on popular image and text datasets, as well as a protein sequence dataset. Our results show that on average, HyperBO is able to locate good hyperparameters at least 3 times more efficiently than the best competing methods on both our new tuning dataset and existing multi-task BO benchmarks.</description><subject>Artificial neural networks</subject><subject>Bayesian analysis</subject><subject>Datasets</subject><subject>Decision theory</subject><subject>Experimentation</subject><subject>Iterative methods</subject><subject>Mathematical models</subject><subject>Neural networks</subject><subject>Optimization</subject><subject>Parameter sensitivity</subject><subject>Training</subject><subject>Tuning</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSwCChK1S0pSszMS01RcE8sLS7OTMxTCCjKT04tLk4tVkjLL1JwSqxMBQv7F5Rk5mZWJZZk5ufxMLCmJeYUp_JCaW4GZTfXEGcP3YKi_MLS1OKS-Kz80qI8oFS8kam5iamJuRnQQuJUAQBd5zZo</recordid><startdate>20240802</startdate><enddate>20240802</enddate><creator>Wang, Zi</creator><creator>Dahl, George E</creator><creator>Swersky, Kevin</creator><creator>Lee, Chansoo</creator><creator>Nado, Zachary</creator><creator>Gilmer, Justin</creator><creator>Snoek, Jasper</creator><creator>Ghahramani, Zoubin</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240802</creationdate><title>Pre-trained Gaussian Processes for Bayesian Optimization</title><author>Wang, Zi ; Dahl, George E ; Swersky, Kevin ; Lee, Chansoo ; Nado, Zachary ; Gilmer, Justin ; Snoek, Jasper ; Ghahramani, Zoubin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_25745476553</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial neural networks</topic><topic>Bayesian analysis</topic><topic>Datasets</topic><topic>Decision theory</topic><topic>Experimentation</topic><topic>Iterative methods</topic><topic>Mathematical models</topic><topic>Neural networks</topic><topic>Optimization</topic><topic>Parameter sensitivity</topic><topic>Training</topic><topic>Tuning</topic><toplevel>online_resources</toplevel><creatorcontrib>Wang, Zi</creatorcontrib><creatorcontrib>Dahl, George E</creatorcontrib><creatorcontrib>Swersky, Kevin</creatorcontrib><creatorcontrib>Lee, Chansoo</creatorcontrib><creatorcontrib>Nado, Zachary</creatorcontrib><creatorcontrib>Gilmer, Justin</creatorcontrib><creatorcontrib>Snoek, Jasper</creatorcontrib><creatorcontrib>Ghahramani, Zoubin</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wang, Zi</au><au>Dahl, George E</au><au>Swersky, Kevin</au><au>Lee, Chansoo</au><au>Nado, Zachary</au><au>Gilmer, Justin</au><au>Snoek, Jasper</au><au>Ghahramani, Zoubin</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Pre-trained Gaussian Processes for Bayesian Optimization</atitle><jtitle>arXiv.org</jtitle><date>2024-08-02</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Bayesian optimization (BO) has become a popular strategy for global optimization of expensive real-world functions. Contrary to a common expectation that BO is suited to optimizing black-box functions, it actually requires domain knowledge about those functions to deploy BO successfully. Such domain knowledge often manifests in Gaussian process (GP) priors that specify initial beliefs on functions. However, even with expert knowledge, it is non-trivial to quantitatively define a prior. This is especially true for hyperparameter tuning problems on complex machine learning models, where landscapes of tuning objectives are often difficult to comprehend. We seek an alternative practice for setting these functional priors. In particular, we consider the scenario where we have data from similar functions that allow us to pre-train a tighter distribution a priori. We detail what pre-training entails for GPs using a KL divergence based loss function, and propose a new pre-training based BO framework named HyperBO. Theoretically, we show bounded posterior predictions and near-zero regrets for HyperBO without assuming the "ground truth" GP prior is known. To verify our approach in realistic setups, we collect a large multi-task hyperparameter tuning dataset by training tens of thousands of configurations of near-state-of-the-art deep learning models on popular image and text datasets, as well as a protein sequence dataset. Our results show that on average, HyperBO is able to locate good hyperparameters at least 3 times more efficiently than the best competing methods on both our new tuning dataset and existing multi-task BO benchmarks.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-08
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2574547655
source	Publicly Available Content Database
subjects	Artificial neural networks Bayesian analysis Datasets Decision theory Experimentation Iterative methods Mathematical models Neural networks Optimization Parameter sensitivity Training Tuning
title	Pre-trained Gaussian Processes for Bayesian Optimization
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T15%3A49%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Pre-trained%20Gaussian%20Processes%20for%20Bayesian%20Optimization&rft.jtitle=arXiv.org&rft.au=Wang,%20Zi&rft.date=2024-08-02&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2574547655%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_25745476553%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2574547655&rft_id=info:pmid/&rfr_iscdi=true