Loading…

Benign Overfitting for Regression with Trained Two-Layer ReLU Networks

We study the least-square regression problem with a two-layer fully-connected neural network, with ReLU activation function, trained by gradient flow. Our first result is a generalization result, that requires no assumptions on the underlying regression function or the noise other than that they are...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2024-10
Main Authors:	Park, Junhyung, Bloebaum, Patrick, Shiva Prasad Kasiviswanathan
Format:	Article
Language:	English
Subjects:	Decomposition Gradient flow Neural networks Regression
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Park, Junhyung Bloebaum, Patrick Shiva Prasad Kasiviswanathan
description	We study the least-square regression problem with a two-layer fully-connected neural network, with ReLU activation function, trained by gradient flow. Our first result is a generalization result, that requires no assumptions on the underlying regression function or the noise other than that they are bounded. We operate in the neural tangent kernel regime, and our generalization result is developed via a decomposition of the excess risk into estimation and approximation errors, viewing gradient flow as an implicit regularizer. This decomposition in the context of neural networks is a novel perspective of gradient descent, and helps us avoid uniform convergence traps. In this work, we also establish that under the same setting, the trained network overfits to the data. Together, these results, establishes the first result on benign overfitting for finite-width ReLU networks for arbitrary regression functions.
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3115226868</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3115226868</sourcerecordid><originalsourceid>FETCH-proquest_journals_31152268683</originalsourceid><addsrcrecordid>eNqNysEKgkAQgOElCJLyHQY6Czqb5rlIOkRB2FmERluL3ZpZk96-gh6g03_4v5EKUOskyheIExWKdHEcY7bENNWBKlZkTWvh8CRujPfGttA4hiO1TCLGWRiMv0DJtbF0hnJw0a5-0VfsTrAnPzi-ykyNm_omFP46VfNiU6630Z3doyfxVed6tp9V6SRJEbM8y_V_6g2lxjsH</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3115226868</pqid></control><display><type>article</type><title>Benign Overfitting for Regression with Trained Two-Layer ReLU Networks</title><source>Publicly Available Content (ProQuest)</source><creator>Park, Junhyung ; Bloebaum, Patrick ; Shiva Prasad Kasiviswanathan</creator><creatorcontrib>Park, Junhyung ; Bloebaum, Patrick ; Shiva Prasad Kasiviswanathan</creatorcontrib><description>We study the least-square regression problem with a two-layer fully-connected neural network, with ReLU activation function, trained by gradient flow. Our first result is a generalization result, that requires no assumptions on the underlying regression function or the noise other than that they are bounded. We operate in the neural tangent kernel regime, and our generalization result is developed via a decomposition of the excess risk into estimation and approximation errors, viewing gradient flow as an implicit regularizer. This decomposition in the context of neural networks is a novel perspective of gradient descent, and helps us avoid uniform convergence traps. In this work, we also establish that under the same setting, the trained network overfits to the data. Together, these results, establishes the first result on benign overfitting for finite-width ReLU networks for arbitrary regression functions.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Decomposition ; Gradient flow ; Neural networks ; Regression</subject><ispartof>arXiv.org, 2024-10</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/3115226868?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>776,780,25731,36989,44566</link.rule.ids></links><search><creatorcontrib>Park, Junhyung</creatorcontrib><creatorcontrib>Bloebaum, Patrick</creatorcontrib><creatorcontrib>Shiva Prasad Kasiviswanathan</creatorcontrib><title>Benign Overfitting for Regression with Trained Two-Layer ReLU Networks</title><title>arXiv.org</title><description>We study the least-square regression problem with a two-layer fully-connected neural network, with ReLU activation function, trained by gradient flow. Our first result is a generalization result, that requires no assumptions on the underlying regression function or the noise other than that they are bounded. We operate in the neural tangent kernel regime, and our generalization result is developed via a decomposition of the excess risk into estimation and approximation errors, viewing gradient flow as an implicit regularizer. This decomposition in the context of neural networks is a novel perspective of gradient descent, and helps us avoid uniform convergence traps. In this work, we also establish that under the same setting, the trained network overfits to the data. Together, these results, establishes the first result on benign overfitting for finite-width ReLU networks for arbitrary regression functions.</description><subject>Decomposition</subject><subject>Gradient flow</subject><subject>Neural networks</subject><subject>Regression</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNysEKgkAQgOElCJLyHQY6Czqb5rlIOkRB2FmERluL3ZpZk96-gh6g03_4v5EKUOskyheIExWKdHEcY7bENNWBKlZkTWvh8CRujPfGttA4hiO1TCLGWRiMv0DJtbF0hnJw0a5-0VfsTrAnPzi-ykyNm_omFP46VfNiU6630Z3doyfxVed6tp9V6SRJEbM8y_V_6g2lxjsH</recordid><startdate>20241008</startdate><enddate>20241008</enddate><creator>Park, Junhyung</creator><creator>Bloebaum, Patrick</creator><creator>Shiva Prasad Kasiviswanathan</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241008</creationdate><title>Benign Overfitting for Regression with Trained Two-Layer ReLU Networks</title><author>Park, Junhyung ; Bloebaum, Patrick ; Shiva Prasad Kasiviswanathan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31152268683</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Decomposition</topic><topic>Gradient flow</topic><topic>Neural networks</topic><topic>Regression</topic><toplevel>online_resources</toplevel><creatorcontrib>Park, Junhyung</creatorcontrib><creatorcontrib>Bloebaum, Patrick</creatorcontrib><creatorcontrib>Shiva Prasad Kasiviswanathan</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Engineering Database</collection><collection>Publicly Available Content (ProQuest)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Park, Junhyung</au><au>Bloebaum, Patrick</au><au>Shiva Prasad Kasiviswanathan</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Benign Overfitting for Regression with Trained Two-Layer ReLU Networks</atitle><jtitle>arXiv.org</jtitle><date>2024-10-08</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>We study the least-square regression problem with a two-layer fully-connected neural network, with ReLU activation function, trained by gradient flow. Our first result is a generalization result, that requires no assumptions on the underlying regression function or the noise other than that they are bounded. We operate in the neural tangent kernel regime, and our generalization result is developed via a decomposition of the excess risk into estimation and approximation errors, viewing gradient flow as an implicit regularizer. This decomposition in the context of neural networks is a novel perspective of gradient descent, and helps us avoid uniform convergence traps. In this work, we also establish that under the same setting, the trained network overfits to the data. Together, these results, establishes the first result on benign overfitting for finite-width ReLU networks for arbitrary regression functions.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-10
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_3115226868
source	Publicly Available Content (ProQuest)
subjects	Decomposition Gradient flow Neural networks Regression
title	Benign Overfitting for Regression with Trained Two-Layer ReLU Networks
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T15%3A51%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Benign%20Overfitting%20for%20Regression%20with%20Trained%20Two-Layer%20ReLU%20Networks&rft.jtitle=arXiv.org&rft.au=Park,%20Junhyung&rft.date=2024-10-08&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3115226868%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_31152268683%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3115226868&rft_id=info:pmid/&rfr_iscdi=true