Loading…
Risk Bound on MDL Estimator for Simple ReLU Networks
To investigate the theoretical foundations of deep learning from the view points of the minimum description length (MDL) principle, we analyse the risk bounds on MDL estimators for simple two-layers neural networks (NNs) with ReLU activation. For that purpose, we construct a two-stage code with smal...
Saved in:
Main Authors: | , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 261 |
container_issue | |
container_start_page | 256 |
container_title | |
container_volume | |
creator | Takeishi, Yoshinari Takeuchi, Jun'ichi |
description | To investigate the theoretical foundations of deep learning from the view points of the minimum description length (MDL) principle, we analyse the risk bounds on MDL estimators for simple two-layers neural networks (NNs) with ReLU activation. For that purpose, we construct a two-stage code with small redundancy based on the fact that the eigenvalue distribution of the Fisher information matrix of the NNs is strongly biased, which was recently shown by Takeishi et al. (2023). This means that the MDL estimator induced by the two-stage code enjoys a tight upper bound on its risk, which is a direct consequence of the theory on MDL estimators originated by Barron and Cover (1991). The target NNs consist of d nodes in the input layer, p nodes in the hidden layer, and one output node. The object of estimation is only the p weights from the hidden layer to the output node. In the context of the large-scale neural networks of interest to us, it is assumed that p\gg d . Note that the leading term of our risk bound is O(d^{2}\log n/n) , independent of p . |
doi_str_mv | 10.1109/ISIT57864.2024.10619170 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10619170</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10619170</ieee_id><sourcerecordid>10619170</sourcerecordid><originalsourceid>FETCH-ieee_primary_106191703</originalsourceid><addsrcrecordid>eNqFjs0KgkAURqcgyMo3CJoX0O6dUWfcVkaCtVBbi9AI5i-OEb19LWrd4uMsDgc-QjYINiL42zAJU1dIz7EZMMdG8NBHARNi-sKX3AUumXS8KTEYusKSiGJOFlrfAbjgwAzixKWu6K57tDfatfR8iGigx7LJx26gxWdJ2fS1orGKrvSixmc3VHpFZkVea2V-uSTrY5DuT1aplMr64ZMPr-z3hv_Rb0F-NrM</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Risk Bound on MDL Estimator for Simple ReLU Networks</title><source>IEEE Xplore All Conference Series</source><creator>Takeishi, Yoshinari ; Takeuchi, Jun'ichi</creator><creatorcontrib>Takeishi, Yoshinari ; Takeuchi, Jun'ichi</creatorcontrib><description>To investigate the theoretical foundations of deep learning from the view points of the minimum description length (MDL) principle, we analyse the risk bounds on MDL estimators for simple two-layers neural networks (NNs) with ReLU activation. For that purpose, we construct a two-stage code with small redundancy based on the fact that the eigenvalue distribution of the Fisher information matrix of the NNs is strongly biased, which was recently shown by Takeishi et al. (2023). This means that the MDL estimator induced by the two-stage code enjoys a tight upper bound on its risk, which is a direct consequence of the theory on MDL estimators originated by Barron and Cover (1991). The target NNs consist of d nodes in the input layer, p nodes in the hidden layer, and one output node. The object of estimation is only the p weights from the hidden layer to the output node. In the context of the large-scale neural networks of interest to us, it is assumed that p\gg d . Note that the leading term of our risk bound is O(d^{2}\log n/n) , independent of p .</description><identifier>EISSN: 2157-8117</identifier><identifier>EISBN: 9798350382846</identifier><identifier>DOI: 10.1109/ISIT57864.2024.10619170</identifier><language>eng</language><publisher>IEEE</publisher><subject>Artificial neural networks ; Codes ; Deep learning ; Eigenvalues and eigenfunctions ; Estimation ; Redundancy ; Upper bound</subject><ispartof>2024 IEEE International Symposium on Information Theory (ISIT), 2024, p.256-261</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10619170$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10619170$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Takeishi, Yoshinari</creatorcontrib><creatorcontrib>Takeuchi, Jun'ichi</creatorcontrib><title>Risk Bound on MDL Estimator for Simple ReLU Networks</title><title>2024 IEEE International Symposium on Information Theory (ISIT)</title><addtitle>ISIT</addtitle><description>To investigate the theoretical foundations of deep learning from the view points of the minimum description length (MDL) principle, we analyse the risk bounds on MDL estimators for simple two-layers neural networks (NNs) with ReLU activation. For that purpose, we construct a two-stage code with small redundancy based on the fact that the eigenvalue distribution of the Fisher information matrix of the NNs is strongly biased, which was recently shown by Takeishi et al. (2023). This means that the MDL estimator induced by the two-stage code enjoys a tight upper bound on its risk, which is a direct consequence of the theory on MDL estimators originated by Barron and Cover (1991). The target NNs consist of d nodes in the input layer, p nodes in the hidden layer, and one output node. The object of estimation is only the p weights from the hidden layer to the output node. In the context of the large-scale neural networks of interest to us, it is assumed that p\gg d . Note that the leading term of our risk bound is O(d^{2}\log n/n) , independent of p .</description><subject>Artificial neural networks</subject><subject>Codes</subject><subject>Deep learning</subject><subject>Eigenvalues and eigenfunctions</subject><subject>Estimation</subject><subject>Redundancy</subject><subject>Upper bound</subject><issn>2157-8117</issn><isbn>9798350382846</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2024</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNqFjs0KgkAURqcgyMo3CJoX0O6dUWfcVkaCtVBbi9AI5i-OEb19LWrd4uMsDgc-QjYINiL42zAJU1dIz7EZMMdG8NBHARNi-sKX3AUumXS8KTEYusKSiGJOFlrfAbjgwAzixKWu6K57tDfatfR8iGigx7LJx26gxWdJ2fS1orGKrvSixmc3VHpFZkVea2V-uSTrY5DuT1aplMr64ZMPr-z3hv_Rb0F-NrM</recordid><startdate>20240707</startdate><enddate>20240707</enddate><creator>Takeishi, Yoshinari</creator><creator>Takeuchi, Jun'ichi</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>20240707</creationdate><title>Risk Bound on MDL Estimator for Simple ReLU Networks</title><author>Takeishi, Yoshinari ; Takeuchi, Jun'ichi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-ieee_primary_106191703</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial neural networks</topic><topic>Codes</topic><topic>Deep learning</topic><topic>Eigenvalues and eigenfunctions</topic><topic>Estimation</topic><topic>Redundancy</topic><topic>Upper bound</topic><toplevel>online_resources</toplevel><creatorcontrib>Takeishi, Yoshinari</creatorcontrib><creatorcontrib>Takeuchi, Jun'ichi</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Takeishi, Yoshinari</au><au>Takeuchi, Jun'ichi</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Risk Bound on MDL Estimator for Simple ReLU Networks</atitle><btitle>2024 IEEE International Symposium on Information Theory (ISIT)</btitle><stitle>ISIT</stitle><date>2024-07-07</date><risdate>2024</risdate><spage>256</spage><epage>261</epage><pages>256-261</pages><eissn>2157-8117</eissn><eisbn>9798350382846</eisbn><abstract>To investigate the theoretical foundations of deep learning from the view points of the minimum description length (MDL) principle, we analyse the risk bounds on MDL estimators for simple two-layers neural networks (NNs) with ReLU activation. For that purpose, we construct a two-stage code with small redundancy based on the fact that the eigenvalue distribution of the Fisher information matrix of the NNs is strongly biased, which was recently shown by Takeishi et al. (2023). This means that the MDL estimator induced by the two-stage code enjoys a tight upper bound on its risk, which is a direct consequence of the theory on MDL estimators originated by Barron and Cover (1991). The target NNs consist of d nodes in the input layer, p nodes in the hidden layer, and one output node. The object of estimation is only the p weights from the hidden layer to the output node. In the context of the large-scale neural networks of interest to us, it is assumed that p\gg d . Note that the leading term of our risk bound is O(d^{2}\log n/n) , independent of p .</abstract><pub>IEEE</pub><doi>10.1109/ISIT57864.2024.10619170</doi></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | EISSN: 2157-8117 |
ispartof | 2024 IEEE International Symposium on Information Theory (ISIT), 2024, p.256-261 |
issn | 2157-8117 |
language | eng |
recordid | cdi_ieee_primary_10619170 |
source | IEEE Xplore All Conference Series |
subjects | Artificial neural networks Codes Deep learning Eigenvalues and eigenfunctions Estimation Redundancy Upper bound |
title | Risk Bound on MDL Estimator for Simple ReLU Networks |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T16%3A49%3A01IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Risk%20Bound%20on%20MDL%20Estimator%20for%20Simple%20ReLU%20Networks&rft.btitle=2024%20IEEE%20International%20Symposium%20on%20Information%20Theory%20(ISIT)&rft.au=Takeishi,%20Yoshinari&rft.date=2024-07-07&rft.spage=256&rft.epage=261&rft.pages=256-261&rft.eissn=2157-8117&rft_id=info:doi/10.1109/ISIT57864.2024.10619170&rft.eisbn=9798350382846&rft_dat=%3Cieee_CHZPO%3E10619170%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-ieee_primary_106191703%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10619170&rfr_iscdi=true |