Loading…

Risk Bounds on MDL Estimators for Linear Regression Models with Application to Simple ReLU Neural Networks

To investigate the theoretical foundations of deep learning from the viewpoint of the minimum description length (MDL) principle, we analyse risk bounds of MDL estimators based on two-stage codes for simple two-layers neural networks (NNs) with ReLU activation. For that purpose, we propose a method...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2024-11
Main Authors:	Takeishi, Yoshinari, Takeuchi, Jun'ichi
Format:	Article
Language:	English
Subjects:	Eigenvalues Estimators Fisher information Machine learning Neural networks Nodes Redundancy Regression analysis Regression models Risk analysis Upper bounds
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Takeishi, Yoshinari Takeuchi, Jun'ichi
description	To investigate the theoretical foundations of deep learning from the viewpoint of the minimum description length (MDL) principle, we analyse risk bounds of MDL estimators based on two-stage codes for simple two-layers neural networks (NNs) with ReLU activation. For that purpose, we propose a method to design two-stage codes for linear regression models and establish an upper bound on the risk of the corresponding MDL estimators based on the theory of MDL estimators originated by Barron and Cover (1991). Then, we apply this result to the simple two-layers NNs with ReLU activation which consist of $d$ nodes in the input layer, $m$ nodes in the hidden layer and one output node. Since the object of estimation is only the $m$ weights from the hidden layer to the output node in our setting, this is an example of linear regression models. As a result, we show that the redundancy of the obtained two-stage codes is small owing to the fact that the eigenvalue distribution of the Fisher information matrix of the NNs is strongly biased, which was shown by Takeishi et al. (2023) and has been refined in this paper. That is, we establish a tight upper bound on the risk of our MDL estimators. Note that our risk bound for the simple ReLU networks, of which the leading term is $O(d^2 \log n /n)$, is independent of the number of parameters $m$.
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3076831887</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3076831887</sourcerecordid><originalsourceid>FETCH-proquest_journals_30768318873</originalsourceid><addsrcrecordid>eNqNyksKwjAUheEgCIp2DxccF2qibac-cVAd-BhLsKmmprk1N8HtW8EFOPrhnK_HhlyIaZzPOB-wiKhOkoSnGZ_PxZDVR01PWGKwJQFa2K8L2JDXjfToCCp0UGirpIOjujtFpL8IS2UI3to_YNG2Rt-k_-4e4aSb1qgOFxc4qOCk6eLf6J40Zv1KGlLRryM22W7Oq13cOnwFRf5aY3C2u64iydJcTPM8E_-pD26kR94</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3076831887</pqid></control><display><type>article</type><title>Risk Bounds on MDL Estimators for Linear Regression Models with Application to Simple ReLU Neural Networks</title><source>Publicly Available Content Database</source><creator>Takeishi, Yoshinari ; Takeuchi, Jun'ichi</creator><creatorcontrib>Takeishi, Yoshinari ; Takeuchi, Jun'ichi</creatorcontrib><description>To investigate the theoretical foundations of deep learning from the viewpoint of the minimum description length (MDL) principle, we analyse risk bounds of MDL estimators based on two-stage codes for simple two-layers neural networks (NNs) with ReLU activation. For that purpose, we propose a method to design two-stage codes for linear regression models and establish an upper bound on the risk of the corresponding MDL estimators based on the theory of MDL estimators originated by Barron and Cover (1991). Then, we apply this result to the simple two-layers NNs with ReLU activation which consist of $d$ nodes in the input layer, $m$ nodes in the hidden layer and one output node. Since the object of estimation is only the $m$ weights from the hidden layer to the output node in our setting, this is an example of linear regression models. As a result, we show that the redundancy of the obtained two-stage codes is small owing to the fact that the eigenvalue distribution of the Fisher information matrix of the NNs is strongly biased, which was shown by Takeishi et al. (2023) and has been refined in this paper. That is, we establish a tight upper bound on the risk of our MDL estimators. Note that our risk bound for the simple ReLU networks, of which the leading term is $O(d^2 \log n /n)$, is independent of the number of parameters $m$.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Eigenvalues ; Estimators ; Fisher information ; Machine learning ; Neural networks ; Nodes ; Redundancy ; Regression analysis ; Regression models ; Risk analysis ; Upper bounds</subject><ispartof>arXiv.org, 2024-11</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/3076831887?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Takeishi, Yoshinari</creatorcontrib><creatorcontrib>Takeuchi, Jun'ichi</creatorcontrib><title>Risk Bounds on MDL Estimators for Linear Regression Models with Application to Simple ReLU Neural Networks</title><title>arXiv.org</title><description>To investigate the theoretical foundations of deep learning from the viewpoint of the minimum description length (MDL) principle, we analyse risk bounds of MDL estimators based on two-stage codes for simple two-layers neural networks (NNs) with ReLU activation. For that purpose, we propose a method to design two-stage codes for linear regression models and establish an upper bound on the risk of the corresponding MDL estimators based on the theory of MDL estimators originated by Barron and Cover (1991). Then, we apply this result to the simple two-layers NNs with ReLU activation which consist of $d$ nodes in the input layer, $m$ nodes in the hidden layer and one output node. Since the object of estimation is only the $m$ weights from the hidden layer to the output node in our setting, this is an example of linear regression models. As a result, we show that the redundancy of the obtained two-stage codes is small owing to the fact that the eigenvalue distribution of the Fisher information matrix of the NNs is strongly biased, which was shown by Takeishi et al. (2023) and has been refined in this paper. That is, we establish a tight upper bound on the risk of our MDL estimators. Note that our risk bound for the simple ReLU networks, of which the leading term is $O(d^2 \log n /n)$, is independent of the number of parameters $m$.</description><subject>Eigenvalues</subject><subject>Estimators</subject><subject>Fisher information</subject><subject>Machine learning</subject><subject>Neural networks</subject><subject>Nodes</subject><subject>Redundancy</subject><subject>Regression analysis</subject><subject>Regression models</subject><subject>Risk analysis</subject><subject>Upper bounds</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNyksKwjAUheEgCIp2DxccF2qibac-cVAd-BhLsKmmprk1N8HtW8EFOPrhnK_HhlyIaZzPOB-wiKhOkoSnGZ_PxZDVR01PWGKwJQFa2K8L2JDXjfToCCp0UGirpIOjujtFpL8IS2UI3to_YNG2Rt-k_-4e4aSb1qgOFxc4qOCk6eLf6J40Zv1KGlLRryM22W7Oq13cOnwFRf5aY3C2u64iydJcTPM8E_-pD26kR94</recordid><startdate>20241118</startdate><enddate>20241118</enddate><creator>Takeishi, Yoshinari</creator><creator>Takeuchi, Jun'ichi</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241118</creationdate><title>Risk Bounds on MDL Estimators for Linear Regression Models with Application to Simple ReLU Neural Networks</title><author>Takeishi, Yoshinari ; Takeuchi, Jun'ichi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_30768318873</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Eigenvalues</topic><topic>Estimators</topic><topic>Fisher information</topic><topic>Machine learning</topic><topic>Neural networks</topic><topic>Nodes</topic><topic>Redundancy</topic><topic>Regression analysis</topic><topic>Regression models</topic><topic>Risk analysis</topic><topic>Upper bounds</topic><toplevel>online_resources</toplevel><creatorcontrib>Takeishi, Yoshinari</creatorcontrib><creatorcontrib>Takeuchi, Jun'ichi</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Takeishi, Yoshinari</au><au>Takeuchi, Jun'ichi</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Risk Bounds on MDL Estimators for Linear Regression Models with Application to Simple ReLU Neural Networks</atitle><jtitle>arXiv.org</jtitle><date>2024-11-18</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>To investigate the theoretical foundations of deep learning from the viewpoint of the minimum description length (MDL) principle, we analyse risk bounds of MDL estimators based on two-stage codes for simple two-layers neural networks (NNs) with ReLU activation. For that purpose, we propose a method to design two-stage codes for linear regression models and establish an upper bound on the risk of the corresponding MDL estimators based on the theory of MDL estimators originated by Barron and Cover (1991). Then, we apply this result to the simple two-layers NNs with ReLU activation which consist of $d$ nodes in the input layer, $m$ nodes in the hidden layer and one output node. Since the object of estimation is only the $m$ weights from the hidden layer to the output node in our setting, this is an example of linear regression models. As a result, we show that the redundancy of the obtained two-stage codes is small owing to the fact that the eigenvalue distribution of the Fisher information matrix of the NNs is strongly biased, which was shown by Takeishi et al. (2023) and has been refined in this paper. That is, we establish a tight upper bound on the risk of our MDL estimators. Note that our risk bound for the simple ReLU networks, of which the leading term is $O(d^2 \log n /n)$, is independent of the number of parameters $m$.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-11
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_3076831887
source	Publicly Available Content Database
subjects	Eigenvalues Estimators Fisher information Machine learning Neural networks Nodes Redundancy Regression analysis Regression models Risk analysis Upper bounds
title	Risk Bounds on MDL Estimators for Linear Regression Models with Application to Simple ReLU Neural Networks
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T16%3A01%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Risk%20Bounds%20on%20MDL%20Estimators%20for%20Linear%20Regression%20Models%20with%20Application%20to%20Simple%20ReLU%20Neural%20Networks&rft.jtitle=arXiv.org&rft.au=Takeishi,%20Yoshinari&rft.date=2024-11-18&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3076831887%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_30768318873%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3076831887&rft_id=info:pmid/&rfr_iscdi=true