Loading…

Risk Bound on MDL Estimator for Simple ReLU Networks

To investigate the theoretical foundations of deep learning from the view points of the minimum description length (MDL) principle, we analyse the risk bounds on MDL estimators for simple two-layers neural networks (NNs) with ReLU activation. For that purpose, we construct a two-stage code with smal...

Full description

Saved in:

Bibliographic Details
Main Authors:	Takeishi, Yoshinari, Takeuchi, Jun'ichi
Format:	Conference Proceeding
Language:	English
Subjects:	Artificial neural networks Codes Deep learning Eigenvalues and eigenfunctions Estimation Redundancy Upper bound
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	261
container_issue
container_start_page	256
container_title
container_volume
creator	Takeishi, Yoshinari Takeuchi, Jun'ichi
description	To investigate the theoretical foundations of deep learning from the view points of the minimum description length (MDL) principle, we analyse the risk bounds on MDL estimators for simple two-layers neural networks (NNs) with ReLU activation. For that purpose, we construct a two-stage code with small redundancy based on the fact that the eigenvalue distribution of the Fisher information matrix of the NNs is strongly biased, which was recently shown by Takeishi et al. (2023). This means that the MDL estimator induced by the two-stage code enjoys a tight upper bound on its risk, which is a direct consequence of the theory on MDL estimators originated by Barron and Cover (1991). The target NNs consist of d nodes in the input layer, p nodes in the hidden layer, and one output node. The object of estimation is only the p weights from the hidden layer to the output node. In the context of the large-scale neural networks of interest to us, it is assumed that p\gg d . Note that the leading term of our risk bound is O(d^{2}\log n/n) , independent of p .
doi_str_mv	10.1109/ISIT57864.2024.10619170
format	conference_proceeding
fullrecord	<record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10619170</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10619170</ieee_id><sourcerecordid>10619170</sourcerecordid><originalsourceid>FETCH-ieee_primary_106191703</originalsourceid><addsrcrecordid>eNqFjs0KgkAURqcgyMo3CJoX0O6dUWfcVkaCtVBbi9AI5i-OEb19LWrd4uMsDgc-QjYINiL42zAJU1dIz7EZMMdG8NBHARNi-sKX3AUumXS8KTEYusKSiGJOFlrfAbjgwAzixKWu6K57tDfatfR8iGigx7LJx26gxWdJ2fS1orGKrvSixmc3VHpFZkVea2V-uSTrY5DuT1aplMr64ZMPr-z3hv_Rb0F-NrM</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Risk Bound on MDL Estimator for Simple ReLU Networks</title><source>IEEE Xplore All Conference Series</source><creator>Takeishi, Yoshinari ; Takeuchi, Jun'ichi</creator><creatorcontrib>Takeishi, Yoshinari ; Takeuchi, Jun'ichi</creatorcontrib><description>To investigate the theoretical foundations of deep learning from the view points of the minimum description length (MDL) principle, we analyse the risk bounds on MDL estimators for simple two-layers neural networks (NNs) with ReLU activation. For that purpose, we construct a two-stage code with small redundancy based on the fact that the eigenvalue distribution of the Fisher information matrix of the NNs is strongly biased, which was recently shown by Takeishi et al. (2023). This means that the MDL estimator induced by the two-stage code enjoys a tight upper bound on its risk, which is a direct consequence of the theory on MDL estimators originated by Barron and Cover (1991). The target NNs consist of d nodes in the input layer, p nodes in the hidden layer, and one output node. The object of estimation is only the p weights from the hidden layer to the output node. In the context of the large-scale neural networks of interest to us, it is assumed that p\gg d . Note that the leading term of our risk bound is O(d^{2}\log n/n) , independent of p .</description><identifier>EISSN: 2157-8117</identifier><identifier>EISBN: 9798350382846</identifier><identifier>DOI: 10.1109/ISIT57864.2024.10619170</identifier><language>eng</language><publisher>IEEE</publisher><subject>Artificial neural networks ; Codes ; Deep learning ; Eigenvalues and eigenfunctions ; Estimation ; Redundancy ; Upper bound</subject><ispartof>2024 IEEE International Symposium on Information Theory (ISIT), 2024, p.256-261</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10619170$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10619170$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Takeishi, Yoshinari</creatorcontrib><creatorcontrib>Takeuchi, Jun'ichi</creatorcontrib><title>Risk Bound on MDL Estimator for Simple ReLU Networks</title><title>2024 IEEE International Symposium on Information Theory (ISIT)</title><addtitle>ISIT</addtitle><description>To investigate the theoretical foundations of deep learning from the view points of the minimum description length (MDL) principle, we analyse the risk bounds on MDL estimators for simple two-layers neural networks (NNs) with ReLU activation. For that purpose, we construct a two-stage code with small redundancy based on the fact that the eigenvalue distribution of the Fisher information matrix of the NNs is strongly biased, which was recently shown by Takeishi et al. (2023). This means that the MDL estimator induced by the two-stage code enjoys a tight upper bound on its risk, which is a direct consequence of the theory on MDL estimators originated by Barron and Cover (1991). The target NNs consist of d nodes in the input layer, p nodes in the hidden layer, and one output node. The object of estimation is only the p weights from the hidden layer to the output node. In the context of the large-scale neural networks of interest to us, it is assumed that p\gg d . Note that the leading term of our risk bound is O(d^{2}\log n/n) , independent of p .</description><subject>Artificial neural networks</subject><subject>Codes</subject><subject>Deep learning</subject><subject>Eigenvalues and eigenfunctions</subject><subject>Estimation</subject><subject>Redundancy</subject><subject>Upper bound</subject><issn>2157-8117</issn><isbn>9798350382846</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2024</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNqFjs0KgkAURqcgyMo3CJoX0O6dUWfcVkaCtVBbi9AI5i-OEb19LWrd4uMsDgc-QjYINiL42zAJU1dIz7EZMMdG8NBHARNi-sKX3AUumXS8KTEYusKSiGJOFlrfAbjgwAzixKWu6K57tDfatfR8iGigx7LJx26gxWdJ2fS1orGKrvSixmc3VHpFZkVea2V-uSTrY5DuT1aplMr64ZMPr-z3hv_Rb0F-NrM</recordid><startdate>20240707</startdate><enddate>20240707</enddate><creator>Takeishi, Yoshinari</creator><creator>Takeuchi, Jun'ichi</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>20240707</creationdate><title>Risk Bound on MDL Estimator for Simple ReLU Networks</title><author>Takeishi, Yoshinari ; Takeuchi, Jun'ichi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-ieee_primary_106191703</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial neural networks</topic><topic>Codes</topic><topic>Deep learning</topic><topic>Eigenvalues and eigenfunctions</topic><topic>Estimation</topic><topic>Redundancy</topic><topic>Upper bound</topic><toplevel>online_resources</toplevel><creatorcontrib>Takeishi, Yoshinari</creatorcontrib><creatorcontrib>Takeuchi, Jun'ichi</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Takeishi, Yoshinari</au><au>Takeuchi, Jun'ichi</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Risk Bound on MDL Estimator for Simple ReLU Networks</atitle><btitle>2024 IEEE International Symposium on Information Theory (ISIT)</btitle><stitle>ISIT</stitle><date>2024-07-07</date><risdate>2024</risdate><spage>256</spage><epage>261</epage><pages>256-261</pages><eissn>2157-8117</eissn><eisbn>9798350382846</eisbn><abstract>To investigate the theoretical foundations of deep learning from the view points of the minimum description length (MDL) principle, we analyse the risk bounds on MDL estimators for simple two-layers neural networks (NNs) with ReLU activation. For that purpose, we construct a two-stage code with small redundancy based on the fact that the eigenvalue distribution of the Fisher information matrix of the NNs is strongly biased, which was recently shown by Takeishi et al. (2023). This means that the MDL estimator induced by the two-stage code enjoys a tight upper bound on its risk, which is a direct consequence of the theory on MDL estimators originated by Barron and Cover (1991). The target NNs consist of d nodes in the input layer, p nodes in the hidden layer, and one output node. The object of estimation is only the p weights from the hidden layer to the output node. In the context of the large-scale neural networks of interest to us, it is assumed that p\gg d . Note that the leading term of our risk bound is O(d^{2}\log n/n) , independent of p .</abstract><pub>IEEE</pub><doi>10.1109/ISIT57864.2024.10619170</doi></addata></record>
fulltext	fulltext_linktorsrc
identifier	EISSN: 2157-8117
ispartof	2024 IEEE International Symposium on Information Theory (ISIT), 2024, p.256-261
issn	2157-8117
language	eng
recordid	cdi_ieee_primary_10619170
source	IEEE Xplore All Conference Series
subjects	Artificial neural networks Codes Deep learning Eigenvalues and eigenfunctions Estimation Redundancy Upper bound
title	Risk Bound on MDL Estimator for Simple ReLU Networks
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T16%3A49%3A01IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Risk%20Bound%20on%20MDL%20Estimator%20for%20Simple%20ReLU%20Networks&rft.btitle=2024%20IEEE%20International%20Symposium%20on%20Information%20Theory%20(ISIT)&rft.au=Takeishi,%20Yoshinari&rft.date=2024-07-07&rft.spage=256&rft.epage=261&rft.pages=256-261&rft.eissn=2157-8117&rft_id=info:doi/10.1109/ISIT57864.2024.10619170&rft.eisbn=9798350382846&rft_dat=%3Cieee_CHZPO%3E10619170%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-ieee_primary_106191703%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10619170&rfr_iscdi=true