Loading…

Bolstering stochastic gradient descent with model building

Stochastic gradient descent method and its variants constitute the core optimization algorithms that achieve good convergence rates for solving machine learning problems. These rates are obtained especially when these algorithms are fine-tuned for the application at hand. Although this tuning proces...

Full description

Saved in:

Bibliographic Details
Published in:	TOP 2024-10, Vol.32 (3), p.517-536
Main Authors:	Birbil, Ş. İlker, Martin, Özgür, Onay, Gönenç, Öztoprak, Figen
Format:	Article
Language:	English
Subjects:	Business and Management Economic Theory/Quantitative Economics/Mathematical Methods Economics Finance Industrial and Production Engineering Insurance Management Operations Research/Decision Theory Optimization Original Paper Statistics for Business
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites	cdi_FETCH-LOGICAL-c286t-3ea76ada04e14fd255b6a0ddc8a049ef44c40bb3078bdf7f087d4df4e638fea63
container_end_page	536
container_issue	3
container_start_page	517
container_title	TOP
container_volume	32
creator	Birbil, Ş. İlker Martin, Özgür Onay, Gönenç Öztoprak, Figen
description	Stochastic gradient descent method and its variants constitute the core optimization algorithms that achieve good convergence rates for solving machine learning problems. These rates are obtained especially when these algorithms are fine-tuned for the application at hand. Although this tuning process can require large computational costs, recent work has shown that these costs can be reduced by line search methods that iteratively adjust the step length. We propose an alternative approach to stochastic line search by using a new algorithm based on forward step model building. This model building step incorporates second-order information that allows adjusting not only the step length but also the search direction. Noting that deep learning model parameters come in groups (layers of tensors), our method builds its model and calculates a new step for each parameter group. This novel diagonalization approach makes the selected step lengths adaptive. We provide convergence rate analysis, and experimentally show that the proposed algorithm achieves faster convergence and better generalization in well-known test problems. More precisely, SMB requires less tuning, and shows comparable performance to other adaptive methods.
doi_str_mv	10.1007/s11750-024-00673-z
format	article
fullrecord	<record><control><sourceid>crossref_sprin</sourceid><recordid>TN_cdi_crossref_primary_10_1007_s11750_024_00673_z</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1007_s11750_024_00673_z</sourcerecordid><originalsourceid>FETCH-LOGICAL-c286t-3ea76ada04e14fd255b6a0ddc8a049ef44c40bb3078bdf7f087d4df4e638fea63</originalsourceid><addsrcrecordid>eNp9j8tOwzAQRS0EEqXwA6zyA4Zx7NguO6igIFViA2vL8aN1lSaV7QrRr8chrFnd0eie0RyEbgncEQBxnwgRDWCoGQbgguLTGZoRySmWtVicl5lQhhvB2SW6Smk3ljiFGXp4GrqUXQz9pkp5MFudcjDVJmobXJ8r65IZ8yvkbbUfrOuq9hg6W_rX6MLrLrmbv5yjz5fnj-UrXr-v3paPa2xqyTOmTguurQbmCPO2bpqWa7DWyLJaOM-YYdC2FIRsrRcepLDMeuY4ld5pTueonu6aOKQUnVeHGPY6fisCarRXk70q9urXXp0KRCcoHUY3F9VuOMa-_Pkf9QNegl91</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Bolstering stochastic gradient descent with model building</title><source>Springer Link</source><creator>Birbil, Ş. İlker ; Martin, Özgür ; Onay, Gönenç ; Öztoprak, Figen</creator><creatorcontrib>Birbil, Ş. İlker ; Martin, Özgür ; Onay, Gönenç ; Öztoprak, Figen</creatorcontrib><description>Stochastic gradient descent method and its variants constitute the core optimization algorithms that achieve good convergence rates for solving machine learning problems. These rates are obtained especially when these algorithms are fine-tuned for the application at hand. Although this tuning process can require large computational costs, recent work has shown that these costs can be reduced by line search methods that iteratively adjust the step length. We propose an alternative approach to stochastic line search by using a new algorithm based on forward step model building. This model building step incorporates second-order information that allows adjusting not only the step length but also the search direction. Noting that deep learning model parameters come in groups (layers of tensors), our method builds its model and calculates a new step for each parameter group. This novel diagonalization approach makes the selected step lengths adaptive. We provide convergence rate analysis, and experimentally show that the proposed algorithm achieves faster convergence and better generalization in well-known test problems. More precisely, SMB requires less tuning, and shows comparable performance to other adaptive methods.</description><identifier>ISSN: 1134-5764</identifier><identifier>EISSN: 1863-8279</identifier><identifier>DOI: 10.1007/s11750-024-00673-z</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Business and Management ; Economic Theory/Quantitative Economics/Mathematical Methods ; Economics ; Finance ; Industrial and Production Engineering ; Insurance ; Management ; Operations Research/Decision Theory ; Optimization ; Original Paper ; Statistics for Business</subject><ispartof>TOP, 2024-10, Vol.32 (3), p.517-536</ispartof><rights>The Author(s) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c286t-3ea76ada04e14fd255b6a0ddc8a049ef44c40bb3078bdf7f087d4df4e638fea63</cites><orcidid>0000-0001-7472-7032</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27923,27924</link.rule.ids></links><search><creatorcontrib>Birbil, Ş. İlker</creatorcontrib><creatorcontrib>Martin, Özgür</creatorcontrib><creatorcontrib>Onay, Gönenç</creatorcontrib><creatorcontrib>Öztoprak, Figen</creatorcontrib><title>Bolstering stochastic gradient descent with model building</title><title>TOP</title><addtitle>TOP</addtitle><description>Stochastic gradient descent method and its variants constitute the core optimization algorithms that achieve good convergence rates for solving machine learning problems. These rates are obtained especially when these algorithms are fine-tuned for the application at hand. Although this tuning process can require large computational costs, recent work has shown that these costs can be reduced by line search methods that iteratively adjust the step length. We propose an alternative approach to stochastic line search by using a new algorithm based on forward step model building. This model building step incorporates second-order information that allows adjusting not only the step length but also the search direction. Noting that deep learning model parameters come in groups (layers of tensors), our method builds its model and calculates a new step for each parameter group. This novel diagonalization approach makes the selected step lengths adaptive. We provide convergence rate analysis, and experimentally show that the proposed algorithm achieves faster convergence and better generalization in well-known test problems. More precisely, SMB requires less tuning, and shows comparable performance to other adaptive methods.</description><subject>Business and Management</subject><subject>Economic Theory/Quantitative Economics/Mathematical Methods</subject><subject>Economics</subject><subject>Finance</subject><subject>Industrial and Production Engineering</subject><subject>Insurance</subject><subject>Management</subject><subject>Operations Research/Decision Theory</subject><subject>Optimization</subject><subject>Original Paper</subject><subject>Statistics for Business</subject><issn>1134-5764</issn><issn>1863-8279</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9j8tOwzAQRS0EEqXwA6zyA4Zx7NguO6igIFViA2vL8aN1lSaV7QrRr8chrFnd0eie0RyEbgncEQBxnwgRDWCoGQbgguLTGZoRySmWtVicl5lQhhvB2SW6Smk3ljiFGXp4GrqUXQz9pkp5MFudcjDVJmobXJ8r65IZ8yvkbbUfrOuq9hg6W_rX6MLrLrmbv5yjz5fnj-UrXr-v3paPa2xqyTOmTguurQbmCPO2bpqWa7DWyLJaOM-YYdC2FIRsrRcepLDMeuY4ld5pTueonu6aOKQUnVeHGPY6fisCarRXk70q9urXXp0KRCcoHUY3F9VuOMa-_Pkf9QNegl91</recordid><startdate>20241001</startdate><enddate>20241001</enddate><creator>Birbil, Ş. İlker</creator><creator>Martin, Özgür</creator><creator>Onay, Gönenç</creator><creator>Öztoprak, Figen</creator><general>Springer Berlin Heidelberg</general><scope>C6C</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0001-7472-7032</orcidid></search><sort><creationdate>20241001</creationdate><title>Bolstering stochastic gradient descent with model building</title><author>Birbil, Ş. İlker ; Martin, Özgür ; Onay, Gönenç ; Öztoprak, Figen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c286t-3ea76ada04e14fd255b6a0ddc8a049ef44c40bb3078bdf7f087d4df4e638fea63</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Business and Management</topic><topic>Economic Theory/Quantitative Economics/Mathematical Methods</topic><topic>Economics</topic><topic>Finance</topic><topic>Industrial and Production Engineering</topic><topic>Insurance</topic><topic>Management</topic><topic>Operations Research/Decision Theory</topic><topic>Optimization</topic><topic>Original Paper</topic><topic>Statistics for Business</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Birbil, Ş. İlker</creatorcontrib><creatorcontrib>Martin, Özgür</creatorcontrib><creatorcontrib>Onay, Gönenç</creatorcontrib><creatorcontrib>Öztoprak, Figen</creatorcontrib><collection>SpringerOpen</collection><collection>CrossRef</collection><jtitle>TOP</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Birbil, Ş. İlker</au><au>Martin, Özgür</au><au>Onay, Gönenç</au><au>Öztoprak, Figen</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Bolstering stochastic gradient descent with model building</atitle><jtitle>TOP</jtitle><stitle>TOP</stitle><date>2024-10-01</date><risdate>2024</risdate><volume>32</volume><issue>3</issue><spage>517</spage><epage>536</epage><pages>517-536</pages><issn>1134-5764</issn><eissn>1863-8279</eissn><abstract>Stochastic gradient descent method and its variants constitute the core optimization algorithms that achieve good convergence rates for solving machine learning problems. These rates are obtained especially when these algorithms are fine-tuned for the application at hand. Although this tuning process can require large computational costs, recent work has shown that these costs can be reduced by line search methods that iteratively adjust the step length. We propose an alternative approach to stochastic line search by using a new algorithm based on forward step model building. This model building step incorporates second-order information that allows adjusting not only the step length but also the search direction. Noting that deep learning model parameters come in groups (layers of tensors), our method builds its model and calculates a new step for each parameter group. This novel diagonalization approach makes the selected step lengths adaptive. We provide convergence rate analysis, and experimentally show that the proposed algorithm achieves faster convergence and better generalization in well-known test problems. More precisely, SMB requires less tuning, and shows comparable performance to other adaptive methods.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s11750-024-00673-z</doi><tpages>20</tpages><orcidid>https://orcid.org/0000-0001-7472-7032</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1134-5764
ispartof	TOP, 2024-10, Vol.32 (3), p.517-536
issn	1134-5764 1863-8279
language	eng
recordid	cdi_crossref_primary_10_1007_s11750_024_00673_z
source	Springer Link
subjects	Business and Management Economic Theory/Quantitative Economics/Mathematical Methods Economics Finance Industrial and Production Engineering Insurance Management Operations Research/Decision Theory Optimization Original Paper Statistics for Business
title	Bolstering stochastic gradient descent with model building
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T03%3A34%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_sprin&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Bolstering%20stochastic%20gradient%20descent%20with%20model%20building&rft.jtitle=TOP&rft.au=Birbil,%20%C5%9E.%20%C4%B0lker&rft.date=2024-10-01&rft.volume=32&rft.issue=3&rft.spage=517&rft.epage=536&rft.pages=517-536&rft.issn=1134-5764&rft.eissn=1863-8279&rft_id=info:doi/10.1007/s11750-024-00673-z&rft_dat=%3Ccrossref_sprin%3E10_1007_s11750_024_00673_z%3C/crossref_sprin%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c286t-3ea76ada04e14fd255b6a0ddc8a049ef44c40bb3078bdf7f087d4df4e638fea63%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true