Loading…
Benign Overfitting for Regression with Trained Two-Layer ReLU Networks
We study the least-square regression problem with a two-layer fully-connected neural network, with ReLU activation function, trained by gradient flow. Our first result is a generalization result, that requires no assumptions on the underlying regression function or the noise other than that they are...
Saved in:
Published in: | arXiv.org 2024-10 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | |
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Park, Junhyung Bloebaum, Patrick Shiva Prasad Kasiviswanathan |
description | We study the least-square regression problem with a two-layer fully-connected neural network, with ReLU activation function, trained by gradient flow. Our first result is a generalization result, that requires no assumptions on the underlying regression function or the noise other than that they are bounded. We operate in the neural tangent kernel regime, and our generalization result is developed via a decomposition of the excess risk into estimation and approximation errors, viewing gradient flow as an implicit regularizer. This decomposition in the context of neural networks is a novel perspective of gradient descent, and helps us avoid uniform convergence traps. In this work, we also establish that under the same setting, the trained network overfits to the data. Together, these results, establishes the first result on benign overfitting for finite-width ReLU networks for arbitrary regression functions. |
format | article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3115226868</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3115226868</sourcerecordid><originalsourceid>FETCH-proquest_journals_31152268683</originalsourceid><addsrcrecordid>eNqNysEKgkAQgOElCJLyHQY6Czqb5rlIOkRB2FmERluL3ZpZk96-gh6g03_4v5EKUOskyheIExWKdHEcY7bENNWBKlZkTWvh8CRujPfGttA4hiO1TCLGWRiMv0DJtbF0hnJw0a5-0VfsTrAnPzi-ykyNm_omFP46VfNiU6630Z3doyfxVed6tp9V6SRJEbM8y_V_6g2lxjsH</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3115226868</pqid></control><display><type>article</type><title>Benign Overfitting for Regression with Trained Two-Layer ReLU Networks</title><source>Publicly Available Content (ProQuest)</source><creator>Park, Junhyung ; Bloebaum, Patrick ; Shiva Prasad Kasiviswanathan</creator><creatorcontrib>Park, Junhyung ; Bloebaum, Patrick ; Shiva Prasad Kasiviswanathan</creatorcontrib><description>We study the least-square regression problem with a two-layer fully-connected neural network, with ReLU activation function, trained by gradient flow. Our first result is a generalization result, that requires no assumptions on the underlying regression function or the noise other than that they are bounded. We operate in the neural tangent kernel regime, and our generalization result is developed via a decomposition of the excess risk into estimation and approximation errors, viewing gradient flow as an implicit regularizer. This decomposition in the context of neural networks is a novel perspective of gradient descent, and helps us avoid uniform convergence traps. In this work, we also establish that under the same setting, the trained network overfits to the data. Together, these results, establishes the first result on benign overfitting for finite-width ReLU networks for arbitrary regression functions.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Decomposition ; Gradient flow ; Neural networks ; Regression</subject><ispartof>arXiv.org, 2024-10</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/3115226868?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>776,780,25731,36989,44566</link.rule.ids></links><search><creatorcontrib>Park, Junhyung</creatorcontrib><creatorcontrib>Bloebaum, Patrick</creatorcontrib><creatorcontrib>Shiva Prasad Kasiviswanathan</creatorcontrib><title>Benign Overfitting for Regression with Trained Two-Layer ReLU Networks</title><title>arXiv.org</title><description>We study the least-square regression problem with a two-layer fully-connected neural network, with ReLU activation function, trained by gradient flow. Our first result is a generalization result, that requires no assumptions on the underlying regression function or the noise other than that they are bounded. We operate in the neural tangent kernel regime, and our generalization result is developed via a decomposition of the excess risk into estimation and approximation errors, viewing gradient flow as an implicit regularizer. This decomposition in the context of neural networks is a novel perspective of gradient descent, and helps us avoid uniform convergence traps. In this work, we also establish that under the same setting, the trained network overfits to the data. Together, these results, establishes the first result on benign overfitting for finite-width ReLU networks for arbitrary regression functions.</description><subject>Decomposition</subject><subject>Gradient flow</subject><subject>Neural networks</subject><subject>Regression</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNysEKgkAQgOElCJLyHQY6Czqb5rlIOkRB2FmERluL3ZpZk96-gh6g03_4v5EKUOskyheIExWKdHEcY7bENNWBKlZkTWvh8CRujPfGttA4hiO1TCLGWRiMv0DJtbF0hnJw0a5-0VfsTrAnPzi-ykyNm_omFP46VfNiU6630Z3doyfxVed6tp9V6SRJEbM8y_V_6g2lxjsH</recordid><startdate>20241008</startdate><enddate>20241008</enddate><creator>Park, Junhyung</creator><creator>Bloebaum, Patrick</creator><creator>Shiva Prasad Kasiviswanathan</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241008</creationdate><title>Benign Overfitting for Regression with Trained Two-Layer ReLU Networks</title><author>Park, Junhyung ; Bloebaum, Patrick ; Shiva Prasad Kasiviswanathan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31152268683</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Decomposition</topic><topic>Gradient flow</topic><topic>Neural networks</topic><topic>Regression</topic><toplevel>online_resources</toplevel><creatorcontrib>Park, Junhyung</creatorcontrib><creatorcontrib>Bloebaum, Patrick</creatorcontrib><creatorcontrib>Shiva Prasad Kasiviswanathan</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Engineering Database</collection><collection>Publicly Available Content (ProQuest)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Park, Junhyung</au><au>Bloebaum, Patrick</au><au>Shiva Prasad Kasiviswanathan</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Benign Overfitting for Regression with Trained Two-Layer ReLU Networks</atitle><jtitle>arXiv.org</jtitle><date>2024-10-08</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>We study the least-square regression problem with a two-layer fully-connected neural network, with ReLU activation function, trained by gradient flow. Our first result is a generalization result, that requires no assumptions on the underlying regression function or the noise other than that they are bounded. We operate in the neural tangent kernel regime, and our generalization result is developed via a decomposition of the excess risk into estimation and approximation errors, viewing gradient flow as an implicit regularizer. This decomposition in the context of neural networks is a novel perspective of gradient descent, and helps us avoid uniform convergence traps. In this work, we also establish that under the same setting, the trained network overfits to the data. Together, these results, establishes the first result on benign overfitting for finite-width ReLU networks for arbitrary regression functions.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2024-10 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_3115226868 |
source | Publicly Available Content (ProQuest) |
subjects | Decomposition Gradient flow Neural networks Regression |
title | Benign Overfitting for Regression with Trained Two-Layer ReLU Networks |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T15%3A51%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Benign%20Overfitting%20for%20Regression%20with%20Trained%20Two-Layer%20ReLU%20Networks&rft.jtitle=arXiv.org&rft.au=Park,%20Junhyung&rft.date=2024-10-08&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3115226868%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_31152268683%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3115226868&rft_id=info:pmid/&rfr_iscdi=true |