Loading…

Low-dimensional intrinsic dimension reveals a phase transition in gradient-based learning of deep neural networks

Deep neural networks complete a feature extraction task by propagating the inputs through multiple modules. However, how the representations evolve with the gradient-based optimization remains unknown. Here we leverage the intrinsic dimension of the representations to study the learning dynamics and...

Full description

Saved in:
Bibliographic Details
Published in:International journal of machine learning and cybernetics 2024-11, Vol.15 (11), p.5381-5394
Main Authors: Tan, Chengli, Zhang, Jiangshe, Liu, Junmin, Zhao, Zixiang
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c200t-769179bce63eaf68be8f831eee7f0520d979d25c7c1198661a5d79e515652a533
container_end_page 5394
container_issue 11
container_start_page 5381
container_title International journal of machine learning and cybernetics
container_volume 15
creator Tan, Chengli
Zhang, Jiangshe
Liu, Junmin
Zhao, Zixiang
description Deep neural networks complete a feature extraction task by propagating the inputs through multiple modules. However, how the representations evolve with the gradient-based optimization remains unknown. Here we leverage the intrinsic dimension of the representations to study the learning dynamics and find that the training process undergoes a phase transition from expansion to compression under disparate training regimes. Surprisingly, this phenomenon is ubiquitous across a wide variety of model architectures, optimizers, and data sets. We demonstrate that the variation in the intrinsic dimension is consistent with the complexity of the learned hypothesis, which can be quantitatively assessed by the critical sample ratio that is rooted in adversarial robustness. Meanwhile, we mathematically show that this phenomenon can be analyzed in terms of the mutable correlation between neurons. Although the evoked activities obey a power-law decaying rule in biological circuits, we identify that the power-law exponent of the representations in deep neural networks predicted adversarial robustness well only at the end of the training but not during the training process. These results together suggest that deep neural networks are prone to producing robust representations by adaptively eliminating or retaining redundancies. The code is publicly available at https://github.com/cltan023/learning2022 .
doi_str_mv 10.1007/s13042-024-02244-x
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3114601074</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3114601074</sourcerecordid><originalsourceid>FETCH-LOGICAL-c200t-769179bce63eaf68be8f831eee7f0520d979d25c7c1198661a5d79e515652a533</originalsourceid><addsrcrecordid>eNp9UEtLAzEQDqJgqf0DngKeVyfJbpI9SvEFBS8K3kK6O1tT2-w22dr6701dqTcDQ2bme8B8hFwyuGYA6iYyATnPgOepeJ5n-xMyYlrqTIN-Oz32ip2TSYxLSE-CEMBHZDNrd1nt1uija71dUef74NJQ0eOWBvxEu4rU0u7dRqR9sAnoD5DzdBFs7dD32TxhNV2hDd75BW0bWiN21OM2JGOP_a4NH_GCnDXJDCe__5i83t-9TB-z2fPD0_R2llUcoM-ULJkq5xVKgbaReo660YIhomqg4FCXqqx5UamKsVJLyWxRqxILVsiC20KIMbkafLvQbrYYe7NstyGdGI1gLJfAQOWJxQdWFdoYAzamC25tw5dhYA7pmiFdk9I1P-mafRKJQRQT2S8w_Fn_o_oGqRl-6g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3114601074</pqid></control><display><type>article</type><title>Low-dimensional intrinsic dimension reveals a phase transition in gradient-based learning of deep neural networks</title><source>Springer Link</source><creator>Tan, Chengli ; Zhang, Jiangshe ; Liu, Junmin ; Zhao, Zixiang</creator><creatorcontrib>Tan, Chengli ; Zhang, Jiangshe ; Liu, Junmin ; Zhao, Zixiang</creatorcontrib><description>Deep neural networks complete a feature extraction task by propagating the inputs through multiple modules. However, how the representations evolve with the gradient-based optimization remains unknown. Here we leverage the intrinsic dimension of the representations to study the learning dynamics and find that the training process undergoes a phase transition from expansion to compression under disparate training regimes. Surprisingly, this phenomenon is ubiquitous across a wide variety of model architectures, optimizers, and data sets. We demonstrate that the variation in the intrinsic dimension is consistent with the complexity of the learned hypothesis, which can be quantitatively assessed by the critical sample ratio that is rooted in adversarial robustness. Meanwhile, we mathematically show that this phenomenon can be analyzed in terms of the mutable correlation between neurons. Although the evoked activities obey a power-law decaying rule in biological circuits, we identify that the power-law exponent of the representations in deep neural networks predicted adversarial robustness well only at the end of the training but not during the training process. These results together suggest that deep neural networks are prone to producing robust representations by adaptively eliminating or retaining redundancies. The code is publicly available at https://github.com/cltan023/learning2022 .</description><identifier>ISSN: 1868-8071</identifier><identifier>EISSN: 1868-808X</identifier><identifier>DOI: 10.1007/s13042-024-02244-x</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Artificial Intelligence ; Artificial neural networks ; Biological activity ; Biological effects ; Complex Systems ; Computational Intelligence ; Control ; Datasets ; Engineering ; Hypotheses ; Learning curves ; Machine learning ; Mechatronics ; Neural networks ; Original Article ; Pattern Recognition ; Phase transitions ; Power law ; Random variables ; Representations ; Robotics ; Robustness ; Systems Biology ; Training</subject><ispartof>International journal of machine learning and cybernetics, 2024-11, Vol.15 (11), p.5381-5394</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c200t-769179bce63eaf68be8f831eee7f0520d979d25c7c1198661a5d79e515652a533</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27903,27904</link.rule.ids></links><search><creatorcontrib>Tan, Chengli</creatorcontrib><creatorcontrib>Zhang, Jiangshe</creatorcontrib><creatorcontrib>Liu, Junmin</creatorcontrib><creatorcontrib>Zhao, Zixiang</creatorcontrib><title>Low-dimensional intrinsic dimension reveals a phase transition in gradient-based learning of deep neural networks</title><title>International journal of machine learning and cybernetics</title><addtitle>Int. J. Mach. Learn. &amp; Cyber</addtitle><description>Deep neural networks complete a feature extraction task by propagating the inputs through multiple modules. However, how the representations evolve with the gradient-based optimization remains unknown. Here we leverage the intrinsic dimension of the representations to study the learning dynamics and find that the training process undergoes a phase transition from expansion to compression under disparate training regimes. Surprisingly, this phenomenon is ubiquitous across a wide variety of model architectures, optimizers, and data sets. We demonstrate that the variation in the intrinsic dimension is consistent with the complexity of the learned hypothesis, which can be quantitatively assessed by the critical sample ratio that is rooted in adversarial robustness. Meanwhile, we mathematically show that this phenomenon can be analyzed in terms of the mutable correlation between neurons. Although the evoked activities obey a power-law decaying rule in biological circuits, we identify that the power-law exponent of the representations in deep neural networks predicted adversarial robustness well only at the end of the training but not during the training process. These results together suggest that deep neural networks are prone to producing robust representations by adaptively eliminating or retaining redundancies. The code is publicly available at https://github.com/cltan023/learning2022 .</description><subject>Artificial Intelligence</subject><subject>Artificial neural networks</subject><subject>Biological activity</subject><subject>Biological effects</subject><subject>Complex Systems</subject><subject>Computational Intelligence</subject><subject>Control</subject><subject>Datasets</subject><subject>Engineering</subject><subject>Hypotheses</subject><subject>Learning curves</subject><subject>Machine learning</subject><subject>Mechatronics</subject><subject>Neural networks</subject><subject>Original Article</subject><subject>Pattern Recognition</subject><subject>Phase transitions</subject><subject>Power law</subject><subject>Random variables</subject><subject>Representations</subject><subject>Robotics</subject><subject>Robustness</subject><subject>Systems Biology</subject><subject>Training</subject><issn>1868-8071</issn><issn>1868-808X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9UEtLAzEQDqJgqf0DngKeVyfJbpI9SvEFBS8K3kK6O1tT2-w22dr6701dqTcDQ2bme8B8hFwyuGYA6iYyATnPgOepeJ5n-xMyYlrqTIN-Oz32ip2TSYxLSE-CEMBHZDNrd1nt1uija71dUef74NJQ0eOWBvxEu4rU0u7dRqR9sAnoD5DzdBFs7dD32TxhNV2hDd75BW0bWiN21OM2JGOP_a4NH_GCnDXJDCe__5i83t-9TB-z2fPD0_R2llUcoM-ULJkq5xVKgbaReo660YIhomqg4FCXqqx5UamKsVJLyWxRqxILVsiC20KIMbkafLvQbrYYe7NstyGdGI1gLJfAQOWJxQdWFdoYAzamC25tw5dhYA7pmiFdk9I1P-mafRKJQRQT2S8w_Fn_o_oGqRl-6g</recordid><startdate>20241101</startdate><enddate>20241101</enddate><creator>Tan, Chengli</creator><creator>Zhang, Jiangshe</creator><creator>Liu, Junmin</creator><creator>Zhao, Zixiang</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>JQ2</scope></search><sort><creationdate>20241101</creationdate><title>Low-dimensional intrinsic dimension reveals a phase transition in gradient-based learning of deep neural networks</title><author>Tan, Chengli ; Zhang, Jiangshe ; Liu, Junmin ; Zhao, Zixiang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c200t-769179bce63eaf68be8f831eee7f0520d979d25c7c1198661a5d79e515652a533</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial Intelligence</topic><topic>Artificial neural networks</topic><topic>Biological activity</topic><topic>Biological effects</topic><topic>Complex Systems</topic><topic>Computational Intelligence</topic><topic>Control</topic><topic>Datasets</topic><topic>Engineering</topic><topic>Hypotheses</topic><topic>Learning curves</topic><topic>Machine learning</topic><topic>Mechatronics</topic><topic>Neural networks</topic><topic>Original Article</topic><topic>Pattern Recognition</topic><topic>Phase transitions</topic><topic>Power law</topic><topic>Random variables</topic><topic>Representations</topic><topic>Robotics</topic><topic>Robustness</topic><topic>Systems Biology</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Tan, Chengli</creatorcontrib><creatorcontrib>Zhang, Jiangshe</creatorcontrib><creatorcontrib>Liu, Junmin</creatorcontrib><creatorcontrib>Zhao, Zixiang</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Computer Science Collection</collection><jtitle>International journal of machine learning and cybernetics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Tan, Chengli</au><au>Zhang, Jiangshe</au><au>Liu, Junmin</au><au>Zhao, Zixiang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Low-dimensional intrinsic dimension reveals a phase transition in gradient-based learning of deep neural networks</atitle><jtitle>International journal of machine learning and cybernetics</jtitle><stitle>Int. J. Mach. Learn. &amp; Cyber</stitle><date>2024-11-01</date><risdate>2024</risdate><volume>15</volume><issue>11</issue><spage>5381</spage><epage>5394</epage><pages>5381-5394</pages><issn>1868-8071</issn><eissn>1868-808X</eissn><abstract>Deep neural networks complete a feature extraction task by propagating the inputs through multiple modules. However, how the representations evolve with the gradient-based optimization remains unknown. Here we leverage the intrinsic dimension of the representations to study the learning dynamics and find that the training process undergoes a phase transition from expansion to compression under disparate training regimes. Surprisingly, this phenomenon is ubiquitous across a wide variety of model architectures, optimizers, and data sets. We demonstrate that the variation in the intrinsic dimension is consistent with the complexity of the learned hypothesis, which can be quantitatively assessed by the critical sample ratio that is rooted in adversarial robustness. Meanwhile, we mathematically show that this phenomenon can be analyzed in terms of the mutable correlation between neurons. Although the evoked activities obey a power-law decaying rule in biological circuits, we identify that the power-law exponent of the representations in deep neural networks predicted adversarial robustness well only at the end of the training but not during the training process. These results together suggest that deep neural networks are prone to producing robust representations by adaptively eliminating or retaining redundancies. The code is publicly available at https://github.com/cltan023/learning2022 .</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s13042-024-02244-x</doi><tpages>14</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1868-8071
ispartof International journal of machine learning and cybernetics, 2024-11, Vol.15 (11), p.5381-5394
issn 1868-8071
1868-808X
language eng
recordid cdi_proquest_journals_3114601074
source Springer Link
subjects Artificial Intelligence
Artificial neural networks
Biological activity
Biological effects
Complex Systems
Computational Intelligence
Control
Datasets
Engineering
Hypotheses
Learning curves
Machine learning
Mechatronics
Neural networks
Original Article
Pattern Recognition
Phase transitions
Power law
Random variables
Representations
Robotics
Robustness
Systems Biology
Training
title Low-dimensional intrinsic dimension reveals a phase transition in gradient-based learning of deep neural networks
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T04%3A20%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Low-dimensional%20intrinsic%20dimension%20reveals%20a%20phase%20transition%20in%20gradient-based%20learning%20of%20deep%20neural%20networks&rft.jtitle=International%20journal%20of%20machine%20learning%20and%20cybernetics&rft.au=Tan,%20Chengli&rft.date=2024-11-01&rft.volume=15&rft.issue=11&rft.spage=5381&rft.epage=5394&rft.pages=5381-5394&rft.issn=1868-8071&rft.eissn=1868-808X&rft_id=info:doi/10.1007/s13042-024-02244-x&rft_dat=%3Cproquest_cross%3E3114601074%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c200t-769179bce63eaf68be8f831eee7f0520d979d25c7c1198661a5d79e515652a533%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3114601074&rft_id=info:pmid/&rfr_iscdi=true