Loading…

Risk Bound on MDL Estimator for Simple ReLU Networks

To investigate the theoretical foundations of deep learning from the view points of the minimum description length (MDL) principle, we analyse the risk bounds on MDL estimators for simple two-layers neural networks (NNs) with ReLU activation. For that purpose, we construct a two-stage code with smal...

Full description

Saved in:
Bibliographic Details
Main Authors: Takeishi, Yoshinari, Takeuchi, Jun'ichi
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 261
container_issue
container_start_page 256
container_title
container_volume
creator Takeishi, Yoshinari
Takeuchi, Jun'ichi
description To investigate the theoretical foundations of deep learning from the view points of the minimum description length (MDL) principle, we analyse the risk bounds on MDL estimators for simple two-layers neural networks (NNs) with ReLU activation. For that purpose, we construct a two-stage code with small redundancy based on the fact that the eigenvalue distribution of the Fisher information matrix of the NNs is strongly biased, which was recently shown by Takeishi et al. (2023). This means that the MDL estimator induced by the two-stage code enjoys a tight upper bound on its risk, which is a direct consequence of the theory on MDL estimators originated by Barron and Cover (1991). The target NNs consist of d nodes in the input layer, p nodes in the hidden layer, and one output node. The object of estimation is only the p weights from the hidden layer to the output node. In the context of the large-scale neural networks of interest to us, it is assumed that p\gg d . Note that the leading term of our risk bound is O(d^{2}\log n/n) , independent of p .
doi_str_mv 10.1109/ISIT57864.2024.10619170
format conference_proceeding
fullrecord <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10619170</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10619170</ieee_id><sourcerecordid>10619170</sourcerecordid><originalsourceid>FETCH-ieee_primary_106191703</originalsourceid><addsrcrecordid>eNqFjs0KgkAURqcgyMo3CJoX0O6dUWfcVkaCtVBbi9AI5i-OEb19LWrd4uMsDgc-QjYINiL42zAJU1dIz7EZMMdG8NBHARNi-sKX3AUumXS8KTEYusKSiGJOFlrfAbjgwAzixKWu6K57tDfatfR8iGigx7LJx26gxWdJ2fS1orGKrvSixmc3VHpFZkVea2V-uSTrY5DuT1aplMr64ZMPr-z3hv_Rb0F-NrM</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Risk Bound on MDL Estimator for Simple ReLU Networks</title><source>IEEE Xplore All Conference Series</source><creator>Takeishi, Yoshinari ; Takeuchi, Jun'ichi</creator><creatorcontrib>Takeishi, Yoshinari ; Takeuchi, Jun'ichi</creatorcontrib><description>To investigate the theoretical foundations of deep learning from the view points of the minimum description length (MDL) principle, we analyse the risk bounds on MDL estimators for simple two-layers neural networks (NNs) with ReLU activation. For that purpose, we construct a two-stage code with small redundancy based on the fact that the eigenvalue distribution of the Fisher information matrix of the NNs is strongly biased, which was recently shown by Takeishi et al. (2023). This means that the MDL estimator induced by the two-stage code enjoys a tight upper bound on its risk, which is a direct consequence of the theory on MDL estimators originated by Barron and Cover (1991). The target NNs consist of d nodes in the input layer, p nodes in the hidden layer, and one output node. The object of estimation is only the p weights from the hidden layer to the output node. In the context of the large-scale neural networks of interest to us, it is assumed that p\gg d . Note that the leading term of our risk bound is O(d^{2}\log n/n) , independent of p .</description><identifier>EISSN: 2157-8117</identifier><identifier>EISBN: 9798350382846</identifier><identifier>DOI: 10.1109/ISIT57864.2024.10619170</identifier><language>eng</language><publisher>IEEE</publisher><subject>Artificial neural networks ; Codes ; Deep learning ; Eigenvalues and eigenfunctions ; Estimation ; Redundancy ; Upper bound</subject><ispartof>2024 IEEE International Symposium on Information Theory (ISIT), 2024, p.256-261</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10619170$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10619170$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Takeishi, Yoshinari</creatorcontrib><creatorcontrib>Takeuchi, Jun'ichi</creatorcontrib><title>Risk Bound on MDL Estimator for Simple ReLU Networks</title><title>2024 IEEE International Symposium on Information Theory (ISIT)</title><addtitle>ISIT</addtitle><description>To investigate the theoretical foundations of deep learning from the view points of the minimum description length (MDL) principle, we analyse the risk bounds on MDL estimators for simple two-layers neural networks (NNs) with ReLU activation. For that purpose, we construct a two-stage code with small redundancy based on the fact that the eigenvalue distribution of the Fisher information matrix of the NNs is strongly biased, which was recently shown by Takeishi et al. (2023). This means that the MDL estimator induced by the two-stage code enjoys a tight upper bound on its risk, which is a direct consequence of the theory on MDL estimators originated by Barron and Cover (1991). The target NNs consist of d nodes in the input layer, p nodes in the hidden layer, and one output node. The object of estimation is only the p weights from the hidden layer to the output node. In the context of the large-scale neural networks of interest to us, it is assumed that p\gg d . Note that the leading term of our risk bound is O(d^{2}\log n/n) , independent of p .</description><subject>Artificial neural networks</subject><subject>Codes</subject><subject>Deep learning</subject><subject>Eigenvalues and eigenfunctions</subject><subject>Estimation</subject><subject>Redundancy</subject><subject>Upper bound</subject><issn>2157-8117</issn><isbn>9798350382846</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2024</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNqFjs0KgkAURqcgyMo3CJoX0O6dUWfcVkaCtVBbi9AI5i-OEb19LWrd4uMsDgc-QjYINiL42zAJU1dIz7EZMMdG8NBHARNi-sKX3AUumXS8KTEYusKSiGJOFlrfAbjgwAzixKWu6K57tDfatfR8iGigx7LJx26gxWdJ2fS1orGKrvSixmc3VHpFZkVea2V-uSTrY5DuT1aplMr64ZMPr-z3hv_Rb0F-NrM</recordid><startdate>20240707</startdate><enddate>20240707</enddate><creator>Takeishi, Yoshinari</creator><creator>Takeuchi, Jun'ichi</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>20240707</creationdate><title>Risk Bound on MDL Estimator for Simple ReLU Networks</title><author>Takeishi, Yoshinari ; Takeuchi, Jun'ichi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-ieee_primary_106191703</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial neural networks</topic><topic>Codes</topic><topic>Deep learning</topic><topic>Eigenvalues and eigenfunctions</topic><topic>Estimation</topic><topic>Redundancy</topic><topic>Upper bound</topic><toplevel>online_resources</toplevel><creatorcontrib>Takeishi, Yoshinari</creatorcontrib><creatorcontrib>Takeuchi, Jun'ichi</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Takeishi, Yoshinari</au><au>Takeuchi, Jun'ichi</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Risk Bound on MDL Estimator for Simple ReLU Networks</atitle><btitle>2024 IEEE International Symposium on Information Theory (ISIT)</btitle><stitle>ISIT</stitle><date>2024-07-07</date><risdate>2024</risdate><spage>256</spage><epage>261</epage><pages>256-261</pages><eissn>2157-8117</eissn><eisbn>9798350382846</eisbn><abstract>To investigate the theoretical foundations of deep learning from the view points of the minimum description length (MDL) principle, we analyse the risk bounds on MDL estimators for simple two-layers neural networks (NNs) with ReLU activation. For that purpose, we construct a two-stage code with small redundancy based on the fact that the eigenvalue distribution of the Fisher information matrix of the NNs is strongly biased, which was recently shown by Takeishi et al. (2023). This means that the MDL estimator induced by the two-stage code enjoys a tight upper bound on its risk, which is a direct consequence of the theory on MDL estimators originated by Barron and Cover (1991). The target NNs consist of d nodes in the input layer, p nodes in the hidden layer, and one output node. The object of estimation is only the p weights from the hidden layer to the output node. In the context of the large-scale neural networks of interest to us, it is assumed that p\gg d . Note that the leading term of our risk bound is O(d^{2}\log n/n) , independent of p .</abstract><pub>IEEE</pub><doi>10.1109/ISIT57864.2024.10619170</doi></addata></record>
fulltext fulltext_linktorsrc
identifier EISSN: 2157-8117
ispartof 2024 IEEE International Symposium on Information Theory (ISIT), 2024, p.256-261
issn 2157-8117
language eng
recordid cdi_ieee_primary_10619170
source IEEE Xplore All Conference Series
subjects Artificial neural networks
Codes
Deep learning
Eigenvalues and eigenfunctions
Estimation
Redundancy
Upper bound
title Risk Bound on MDL Estimator for Simple ReLU Networks
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T16%3A49%3A01IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Risk%20Bound%20on%20MDL%20Estimator%20for%20Simple%20ReLU%20Networks&rft.btitle=2024%20IEEE%20International%20Symposium%20on%20Information%20Theory%20(ISIT)&rft.au=Takeishi,%20Yoshinari&rft.date=2024-07-07&rft.spage=256&rft.epage=261&rft.pages=256-261&rft.eissn=2157-8117&rft_id=info:doi/10.1109/ISIT57864.2024.10619170&rft.eisbn=9798350382846&rft_dat=%3Cieee_CHZPO%3E10619170%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-ieee_primary_106191703%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10619170&rfr_iscdi=true