Loading…

Benign Overfitting for Regression with Trained Two-Layer ReLU Networks

We study the least-square regression problem with a two-layer fully-connected neural network, with ReLU activation function, trained by gradient flow. Our first result is a generalization result, that requires no assumptions on the underlying regression function or the noise other than that they are...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2024-10
Main Authors: Park, Junhyung, Bloebaum, Patrick, Shiva Prasad Kasiviswanathan
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Park, Junhyung
Bloebaum, Patrick
Shiva Prasad Kasiviswanathan
description We study the least-square regression problem with a two-layer fully-connected neural network, with ReLU activation function, trained by gradient flow. Our first result is a generalization result, that requires no assumptions on the underlying regression function or the noise other than that they are bounded. We operate in the neural tangent kernel regime, and our generalization result is developed via a decomposition of the excess risk into estimation and approximation errors, viewing gradient flow as an implicit regularizer. This decomposition in the context of neural networks is a novel perspective of gradient descent, and helps us avoid uniform convergence traps. In this work, we also establish that under the same setting, the trained network overfits to the data. Together, these results, establishes the first result on benign overfitting for finite-width ReLU networks for arbitrary regression functions.
format article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3115226868</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3115226868</sourcerecordid><originalsourceid>FETCH-proquest_journals_31152268683</originalsourceid><addsrcrecordid>eNqNysEKgkAQgOElCJLyHQY6Czqb5rlIOkRB2FmERluL3ZpZk96-gh6g03_4v5EKUOskyheIExWKdHEcY7bENNWBKlZkTWvh8CRujPfGttA4hiO1TCLGWRiMv0DJtbF0hnJw0a5-0VfsTrAnPzi-ykyNm_omFP46VfNiU6630Z3doyfxVed6tp9V6SRJEbM8y_V_6g2lxjsH</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3115226868</pqid></control><display><type>article</type><title>Benign Overfitting for Regression with Trained Two-Layer ReLU Networks</title><source>Publicly Available Content (ProQuest)</source><creator>Park, Junhyung ; Bloebaum, Patrick ; Shiva Prasad Kasiviswanathan</creator><creatorcontrib>Park, Junhyung ; Bloebaum, Patrick ; Shiva Prasad Kasiviswanathan</creatorcontrib><description>We study the least-square regression problem with a two-layer fully-connected neural network, with ReLU activation function, trained by gradient flow. Our first result is a generalization result, that requires no assumptions on the underlying regression function or the noise other than that they are bounded. We operate in the neural tangent kernel regime, and our generalization result is developed via a decomposition of the excess risk into estimation and approximation errors, viewing gradient flow as an implicit regularizer. This decomposition in the context of neural networks is a novel perspective of gradient descent, and helps us avoid uniform convergence traps. In this work, we also establish that under the same setting, the trained network overfits to the data. Together, these results, establishes the first result on benign overfitting for finite-width ReLU networks for arbitrary regression functions.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Decomposition ; Gradient flow ; Neural networks ; Regression</subject><ispartof>arXiv.org, 2024-10</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/3115226868?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>776,780,25731,36989,44566</link.rule.ids></links><search><creatorcontrib>Park, Junhyung</creatorcontrib><creatorcontrib>Bloebaum, Patrick</creatorcontrib><creatorcontrib>Shiva Prasad Kasiviswanathan</creatorcontrib><title>Benign Overfitting for Regression with Trained Two-Layer ReLU Networks</title><title>arXiv.org</title><description>We study the least-square regression problem with a two-layer fully-connected neural network, with ReLU activation function, trained by gradient flow. Our first result is a generalization result, that requires no assumptions on the underlying regression function or the noise other than that they are bounded. We operate in the neural tangent kernel regime, and our generalization result is developed via a decomposition of the excess risk into estimation and approximation errors, viewing gradient flow as an implicit regularizer. This decomposition in the context of neural networks is a novel perspective of gradient descent, and helps us avoid uniform convergence traps. In this work, we also establish that under the same setting, the trained network overfits to the data. Together, these results, establishes the first result on benign overfitting for finite-width ReLU networks for arbitrary regression functions.</description><subject>Decomposition</subject><subject>Gradient flow</subject><subject>Neural networks</subject><subject>Regression</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNysEKgkAQgOElCJLyHQY6Czqb5rlIOkRB2FmERluL3ZpZk96-gh6g03_4v5EKUOskyheIExWKdHEcY7bENNWBKlZkTWvh8CRujPfGttA4hiO1TCLGWRiMv0DJtbF0hnJw0a5-0VfsTrAnPzi-ykyNm_omFP46VfNiU6630Z3doyfxVed6tp9V6SRJEbM8y_V_6g2lxjsH</recordid><startdate>20241008</startdate><enddate>20241008</enddate><creator>Park, Junhyung</creator><creator>Bloebaum, Patrick</creator><creator>Shiva Prasad Kasiviswanathan</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241008</creationdate><title>Benign Overfitting for Regression with Trained Two-Layer ReLU Networks</title><author>Park, Junhyung ; Bloebaum, Patrick ; Shiva Prasad Kasiviswanathan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31152268683</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Decomposition</topic><topic>Gradient flow</topic><topic>Neural networks</topic><topic>Regression</topic><toplevel>online_resources</toplevel><creatorcontrib>Park, Junhyung</creatorcontrib><creatorcontrib>Bloebaum, Patrick</creatorcontrib><creatorcontrib>Shiva Prasad Kasiviswanathan</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Engineering Database</collection><collection>Publicly Available Content (ProQuest)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Park, Junhyung</au><au>Bloebaum, Patrick</au><au>Shiva Prasad Kasiviswanathan</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Benign Overfitting for Regression with Trained Two-Layer ReLU Networks</atitle><jtitle>arXiv.org</jtitle><date>2024-10-08</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>We study the least-square regression problem with a two-layer fully-connected neural network, with ReLU activation function, trained by gradient flow. Our first result is a generalization result, that requires no assumptions on the underlying regression function or the noise other than that they are bounded. We operate in the neural tangent kernel regime, and our generalization result is developed via a decomposition of the excess risk into estimation and approximation errors, viewing gradient flow as an implicit regularizer. This decomposition in the context of neural networks is a novel perspective of gradient descent, and helps us avoid uniform convergence traps. In this work, we also establish that under the same setting, the trained network overfits to the data. Together, these results, establishes the first result on benign overfitting for finite-width ReLU networks for arbitrary regression functions.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-10
issn 2331-8422
language eng
recordid cdi_proquest_journals_3115226868
source Publicly Available Content (ProQuest)
subjects Decomposition
Gradient flow
Neural networks
Regression
title Benign Overfitting for Regression with Trained Two-Layer ReLU Networks
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T15%3A51%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Benign%20Overfitting%20for%20Regression%20with%20Trained%20Two-Layer%20ReLU%20Networks&rft.jtitle=arXiv.org&rft.au=Park,%20Junhyung&rft.date=2024-10-08&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3115226868%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_31152268683%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3115226868&rft_id=info:pmid/&rfr_iscdi=true