Loading…

Low-dimensional intrinsic dimension reveals a phase transition in gradient-based learning of deep neural networks

Deep neural networks complete a feature extraction task by propagating the inputs through multiple modules. However, how the representations evolve with the gradient-based optimization remains unknown. Here we leverage the intrinsic dimension of the representations to study the learning dynamics and...

Full description

Saved in:

Bibliographic Details
Published in:	International journal of machine learning and cybernetics 2024-11, Vol.15 (11), p.5381-5394
Main Authors:	Tan, Chengli, Zhang, Jiangshe, Liu, Junmin, Zhao, Zixiang
Format:	Article
Language:	English
Subjects:	Artificial Intelligence Artificial neural networks Biological activity Biological effects Complex Systems Computational Intelligence Control Datasets Engineering Hypotheses Learning curves Machine learning Mechatronics Neural networks Original Article Pattern Recognition Phase transitions Power law Random variables Representations Robotics Robustness Systems Biology Training
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites	cdi_FETCH-LOGICAL-c200t-769179bce63eaf68be8f831eee7f0520d979d25c7c1198661a5d79e515652a533
container_end_page	5394
container_issue	11
container_start_page	5381
container_title	International journal of machine learning and cybernetics
container_volume	15
creator	Tan, Chengli Zhang, Jiangshe Liu, Junmin Zhao, Zixiang
description	Deep neural networks complete a feature extraction task by propagating the inputs through multiple modules. However, how the representations evolve with the gradient-based optimization remains unknown. Here we leverage the intrinsic dimension of the representations to study the learning dynamics and find that the training process undergoes a phase transition from expansion to compression under disparate training regimes. Surprisingly, this phenomenon is ubiquitous across a wide variety of model architectures, optimizers, and data sets. We demonstrate that the variation in the intrinsic dimension is consistent with the complexity of the learned hypothesis, which can be quantitatively assessed by the critical sample ratio that is rooted in adversarial robustness. Meanwhile, we mathematically show that this phenomenon can be analyzed in terms of the mutable correlation between neurons. Although the evoked activities obey a power-law decaying rule in biological circuits, we identify that the power-law exponent of the representations in deep neural networks predicted adversarial robustness well only at the end of the training but not during the training process. These results together suggest that deep neural networks are prone to producing robust representations by adaptively eliminating or retaining redundancies. The code is publicly available at https://github.com/cltan023/learning2022 .
doi_str_mv	10.1007/s13042-024-02244-x
format	article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3114601074</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3114601074</sourcerecordid><originalsourceid>FETCH-LOGICAL-c200t-769179bce63eaf68be8f831eee7f0520d979d25c7c1198661a5d79e515652a533</originalsourceid><addsrcrecordid>eNp9UEtLAzEQDqJgqf0DngKeVyfJbpI9SvEFBS8K3kK6O1tT2-w22dr6701dqTcDQ2bme8B8hFwyuGYA6iYyATnPgOepeJ5n-xMyYlrqTIN-Oz32ip2TSYxLSE-CEMBHZDNrd1nt1uija71dUef74NJQ0eOWBvxEu4rU0u7dRqR9sAnoD5DzdBFs7dD32TxhNV2hDd75BW0bWiN21OM2JGOP_a4NH_GCnDXJDCe__5i83t-9TB-z2fPD0_R2llUcoM-ULJkq5xVKgbaReo660YIhomqg4FCXqqx5UamKsVJLyWxRqxILVsiC20KIMbkafLvQbrYYe7NstyGdGI1gLJfAQOWJxQdWFdoYAzamC25tw5dhYA7pmiFdk9I1P-mafRKJQRQT2S8w_Fn_o_oGqRl-6g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3114601074</pqid></control><display><type>article</type><title>Low-dimensional intrinsic dimension reveals a phase transition in gradient-based learning of deep neural networks</title><source>Springer Link</source><creator>Tan, Chengli ; Zhang, Jiangshe ; Liu, Junmin ; Zhao, Zixiang</creator><creatorcontrib>Tan, Chengli ; Zhang, Jiangshe ; Liu, Junmin ; Zhao, Zixiang</creatorcontrib><description>Deep neural networks complete a feature extraction task by propagating the inputs through multiple modules. However, how the representations evolve with the gradient-based optimization remains unknown. Here we leverage the intrinsic dimension of the representations to study the learning dynamics and find that the training process undergoes a phase transition from expansion to compression under disparate training regimes. Surprisingly, this phenomenon is ubiquitous across a wide variety of model architectures, optimizers, and data sets. We demonstrate that the variation in the intrinsic dimension is consistent with the complexity of the learned hypothesis, which can be quantitatively assessed by the critical sample ratio that is rooted in adversarial robustness. Meanwhile, we mathematically show that this phenomenon can be analyzed in terms of the mutable correlation between neurons. Although the evoked activities obey a power-law decaying rule in biological circuits, we identify that the power-law exponent of the representations in deep neural networks predicted adversarial robustness well only at the end of the training but not during the training process. These results together suggest that deep neural networks are prone to producing robust representations by adaptively eliminating or retaining redundancies. The code is publicly available at https://github.com/cltan023/learning2022 .</description><identifier>ISSN: 1868-8071</identifier><identifier>EISSN: 1868-808X</identifier><identifier>DOI: 10.1007/s13042-024-02244-x</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Artificial Intelligence ; Artificial neural networks ; Biological activity ; Biological effects ; Complex Systems ; Computational Intelligence ; Control ; Datasets ; Engineering ; Hypotheses ; Learning curves ; Machine learning ; Mechatronics ; Neural networks ; Original Article ; Pattern Recognition ; Phase transitions ; Power law ; Random variables ; Representations ; Robotics ; Robustness ; Systems Biology ; Training</subject><ispartof>International journal of machine learning and cybernetics, 2024-11, Vol.15 (11), p.5381-5394</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c200t-769179bce63eaf68be8f831eee7f0520d979d25c7c1198661a5d79e515652a533</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27903,27904</link.rule.ids></links><search><creatorcontrib>Tan, Chengli</creatorcontrib><creatorcontrib>Zhang, Jiangshe</creatorcontrib><creatorcontrib>Liu, Junmin</creatorcontrib><creatorcontrib>Zhao, Zixiang</creatorcontrib><title>Low-dimensional intrinsic dimension reveals a phase transition in gradient-based learning of deep neural networks</title><title>International journal of machine learning and cybernetics</title><addtitle>Int. J. Mach. Learn. & Cyber</addtitle><description>Deep neural networks complete a feature extraction task by propagating the inputs through multiple modules. However, how the representations evolve with the gradient-based optimization remains unknown. Here we leverage the intrinsic dimension of the representations to study the learning dynamics and find that the training process undergoes a phase transition from expansion to compression under disparate training regimes. Surprisingly, this phenomenon is ubiquitous across a wide variety of model architectures, optimizers, and data sets. We demonstrate that the variation in the intrinsic dimension is consistent with the complexity of the learned hypothesis, which can be quantitatively assessed by the critical sample ratio that is rooted in adversarial robustness. Meanwhile, we mathematically show that this phenomenon can be analyzed in terms of the mutable correlation between neurons. Although the evoked activities obey a power-law decaying rule in biological circuits, we identify that the power-law exponent of the representations in deep neural networks predicted adversarial robustness well only at the end of the training but not during the training process. These results together suggest that deep neural networks are prone to producing robust representations by adaptively eliminating or retaining redundancies. The code is publicly available at https://github.com/cltan023/learning2022 .</description><subject>Artificial Intelligence</subject><subject>Artificial neural networks</subject><subject>Biological activity</subject><subject>Biological effects</subject><subject>Complex Systems</subject><subject>Computational Intelligence</subject><subject>Control</subject><subject>Datasets</subject><subject>Engineering</subject><subject>Hypotheses</subject><subject>Learning curves</subject><subject>Machine learning</subject><subject>Mechatronics</subject><subject>Neural networks</subject><subject>Original Article</subject><subject>Pattern Recognition</subject><subject>Phase transitions</subject><subject>Power law</subject><subject>Random variables</subject><subject>Representations</subject><subject>Robotics</subject><subject>Robustness</subject><subject>Systems Biology</subject><subject>Training</subject><issn>1868-8071</issn><issn>1868-808X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9UEtLAzEQDqJgqf0DngKeVyfJbpI9SvEFBS8K3kK6O1tT2-w22dr6701dqTcDQ2bme8B8hFwyuGYA6iYyATnPgOepeJ5n-xMyYlrqTIN-Oz32ip2TSYxLSE-CEMBHZDNrd1nt1uija71dUef74NJQ0eOWBvxEu4rU0u7dRqR9sAnoD5DzdBFs7dD32TxhNV2hDd75BW0bWiN21OM2JGOP_a4NH_GCnDXJDCe__5i83t-9TB-z2fPD0_R2llUcoM-ULJkq5xVKgbaReo660YIhomqg4FCXqqx5UamKsVJLyWxRqxILVsiC20KIMbkafLvQbrYYe7NstyGdGI1gLJfAQOWJxQdWFdoYAzamC25tw5dhYA7pmiFdk9I1P-mafRKJQRQT2S8w_Fn_o_oGqRl-6g</recordid><startdate>20241101</startdate><enddate>20241101</enddate><creator>Tan, Chengli</creator><creator>Zhang, Jiangshe</creator><creator>Liu, Junmin</creator><creator>Zhao, Zixiang</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>JQ2</scope></search><sort><creationdate>20241101</creationdate><title>Low-dimensional intrinsic dimension reveals a phase transition in gradient-based learning of deep neural networks</title><author>Tan, Chengli ; Zhang, Jiangshe ; Liu, Junmin ; Zhao, Zixiang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c200t-769179bce63eaf68be8f831eee7f0520d979d25c7c1198661a5d79e515652a533</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial Intelligence</topic><topic>Artificial neural networks</topic><topic>Biological activity</topic><topic>Biological effects</topic><topic>Complex Systems</topic><topic>Computational Intelligence</topic><topic>Control</topic><topic>Datasets</topic><topic>Engineering</topic><topic>Hypotheses</topic><topic>Learning curves</topic><topic>Machine learning</topic><topic>Mechatronics</topic><topic>Neural networks</topic><topic>Original Article</topic><topic>Pattern Recognition</topic><topic>Phase transitions</topic><topic>Power law</topic><topic>Random variables</topic><topic>Representations</topic><topic>Robotics</topic><topic>Robustness</topic><topic>Systems Biology</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Tan, Chengli</creatorcontrib><creatorcontrib>Zhang, Jiangshe</creatorcontrib><creatorcontrib>Liu, Junmin</creatorcontrib><creatorcontrib>Zhao, Zixiang</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Computer Science Collection</collection><jtitle>International journal of machine learning and cybernetics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Tan, Chengli</au><au>Zhang, Jiangshe</au><au>Liu, Junmin</au><au>Zhao, Zixiang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Low-dimensional intrinsic dimension reveals a phase transition in gradient-based learning of deep neural networks</atitle><jtitle>International journal of machine learning and cybernetics</jtitle><stitle>Int. J. Mach. Learn. & Cyber</stitle><date>2024-11-01</date><risdate>2024</risdate><volume>15</volume><issue>11</issue><spage>5381</spage><epage>5394</epage><pages>5381-5394</pages><issn>1868-8071</issn><eissn>1868-808X</eissn><abstract>Deep neural networks complete a feature extraction task by propagating the inputs through multiple modules. However, how the representations evolve with the gradient-based optimization remains unknown. Here we leverage the intrinsic dimension of the representations to study the learning dynamics and find that the training process undergoes a phase transition from expansion to compression under disparate training regimes. Surprisingly, this phenomenon is ubiquitous across a wide variety of model architectures, optimizers, and data sets. We demonstrate that the variation in the intrinsic dimension is consistent with the complexity of the learned hypothesis, which can be quantitatively assessed by the critical sample ratio that is rooted in adversarial robustness. Meanwhile, we mathematically show that this phenomenon can be analyzed in terms of the mutable correlation between neurons. Although the evoked activities obey a power-law decaying rule in biological circuits, we identify that the power-law exponent of the representations in deep neural networks predicted adversarial robustness well only at the end of the training but not during the training process. These results together suggest that deep neural networks are prone to producing robust representations by adaptively eliminating or retaining redundancies. The code is publicly available at https://github.com/cltan023/learning2022 .</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s13042-024-02244-x</doi><tpages>14</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 1868-8071
ispartof	International journal of machine learning and cybernetics, 2024-11, Vol.15 (11), p.5381-5394
issn	1868-8071 1868-808X
language	eng
recordid	cdi_proquest_journals_3114601074
source	Springer Link
subjects	Artificial Intelligence Artificial neural networks Biological activity Biological effects Complex Systems Computational Intelligence Control Datasets Engineering Hypotheses Learning curves Machine learning Mechatronics Neural networks Original Article Pattern Recognition Phase transitions Power law Random variables Representations Robotics Robustness Systems Biology Training
title	Low-dimensional intrinsic dimension reveals a phase transition in gradient-based learning of deep neural networks
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T04%3A20%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Low-dimensional%20intrinsic%20dimension%20reveals%20a%20phase%20transition%20in%20gradient-based%20learning%20of%20deep%20neural%20networks&rft.jtitle=International%20journal%20of%20machine%20learning%20and%20cybernetics&rft.au=Tan,%20Chengli&rft.date=2024-11-01&rft.volume=15&rft.issue=11&rft.spage=5381&rft.epage=5394&rft.pages=5381-5394&rft.issn=1868-8071&rft.eissn=1868-808X&rft_id=info:doi/10.1007/s13042-024-02244-x&rft_dat=%3Cproquest_cross%3E3114601074%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c200t-769179bce63eaf68be8f831eee7f0520d979d25c7c1198661a5d79e515652a533%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3114601074&rft_id=info:pmid/&rfr_iscdi=true