Loading…
Low-dimensional intrinsic dimension reveals a phase transition in gradient-based learning of deep neural networks
Deep neural networks complete a feature extraction task by propagating the inputs through multiple modules. However, how the representations evolve with the gradient-based optimization remains unknown. Here we leverage the intrinsic dimension of the representations to study the learning dynamics and...
Saved in:
Published in: | International journal of machine learning and cybernetics 2024-11, Vol.15 (11), p.5381-5394 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | cdi_FETCH-LOGICAL-c200t-769179bce63eaf68be8f831eee7f0520d979d25c7c1198661a5d79e515652a533 |
container_end_page | 5394 |
container_issue | 11 |
container_start_page | 5381 |
container_title | International journal of machine learning and cybernetics |
container_volume | 15 |
creator | Tan, Chengli Zhang, Jiangshe Liu, Junmin Zhao, Zixiang |
description | Deep neural networks complete a feature extraction task by propagating the inputs through multiple modules. However, how the representations evolve with the gradient-based optimization remains unknown. Here we leverage the intrinsic dimension of the representations to study the learning dynamics and find that the training process undergoes a phase transition from expansion to compression under disparate training regimes. Surprisingly, this phenomenon is ubiquitous across a wide variety of model architectures, optimizers, and data sets. We demonstrate that the variation in the intrinsic dimension is consistent with the complexity of the learned hypothesis, which can be quantitatively assessed by the critical sample ratio that is rooted in adversarial robustness. Meanwhile, we mathematically show that this phenomenon can be analyzed in terms of the mutable correlation between neurons. Although the evoked activities obey a power-law decaying rule in biological circuits, we identify that the power-law exponent of the representations in deep neural networks predicted adversarial robustness well only at the end of the training but not during the training process. These results together suggest that deep neural networks are prone to producing robust representations by adaptively eliminating or retaining redundancies. The code is publicly available at
https://github.com/cltan023/learning2022
. |
doi_str_mv | 10.1007/s13042-024-02244-x |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3114601074</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3114601074</sourcerecordid><originalsourceid>FETCH-LOGICAL-c200t-769179bce63eaf68be8f831eee7f0520d979d25c7c1198661a5d79e515652a533</originalsourceid><addsrcrecordid>eNp9UEtLAzEQDqJgqf0DngKeVyfJbpI9SvEFBS8K3kK6O1tT2-w22dr6701dqTcDQ2bme8B8hFwyuGYA6iYyATnPgOepeJ5n-xMyYlrqTIN-Oz32ip2TSYxLSE-CEMBHZDNrd1nt1uija71dUef74NJQ0eOWBvxEu4rU0u7dRqR9sAnoD5DzdBFs7dD32TxhNV2hDd75BW0bWiN21OM2JGOP_a4NH_GCnDXJDCe__5i83t-9TB-z2fPD0_R2llUcoM-ULJkq5xVKgbaReo660YIhomqg4FCXqqx5UamKsVJLyWxRqxILVsiC20KIMbkafLvQbrYYe7NstyGdGI1gLJfAQOWJxQdWFdoYAzamC25tw5dhYA7pmiFdk9I1P-mafRKJQRQT2S8w_Fn_o_oGqRl-6g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3114601074</pqid></control><display><type>article</type><title>Low-dimensional intrinsic dimension reveals a phase transition in gradient-based learning of deep neural networks</title><source>Springer Link</source><creator>Tan, Chengli ; Zhang, Jiangshe ; Liu, Junmin ; Zhao, Zixiang</creator><creatorcontrib>Tan, Chengli ; Zhang, Jiangshe ; Liu, Junmin ; Zhao, Zixiang</creatorcontrib><description>Deep neural networks complete a feature extraction task by propagating the inputs through multiple modules. However, how the representations evolve with the gradient-based optimization remains unknown. Here we leverage the intrinsic dimension of the representations to study the learning dynamics and find that the training process undergoes a phase transition from expansion to compression under disparate training regimes. Surprisingly, this phenomenon is ubiquitous across a wide variety of model architectures, optimizers, and data sets. We demonstrate that the variation in the intrinsic dimension is consistent with the complexity of the learned hypothesis, which can be quantitatively assessed by the critical sample ratio that is rooted in adversarial robustness. Meanwhile, we mathematically show that this phenomenon can be analyzed in terms of the mutable correlation between neurons. Although the evoked activities obey a power-law decaying rule in biological circuits, we identify that the power-law exponent of the representations in deep neural networks predicted adversarial robustness well only at the end of the training but not during the training process. These results together suggest that deep neural networks are prone to producing robust representations by adaptively eliminating or retaining redundancies. The code is publicly available at
https://github.com/cltan023/learning2022
.</description><identifier>ISSN: 1868-8071</identifier><identifier>EISSN: 1868-808X</identifier><identifier>DOI: 10.1007/s13042-024-02244-x</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Artificial Intelligence ; Artificial neural networks ; Biological activity ; Biological effects ; Complex Systems ; Computational Intelligence ; Control ; Datasets ; Engineering ; Hypotheses ; Learning curves ; Machine learning ; Mechatronics ; Neural networks ; Original Article ; Pattern Recognition ; Phase transitions ; Power law ; Random variables ; Representations ; Robotics ; Robustness ; Systems Biology ; Training</subject><ispartof>International journal of machine learning and cybernetics, 2024-11, Vol.15 (11), p.5381-5394</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c200t-769179bce63eaf68be8f831eee7f0520d979d25c7c1198661a5d79e515652a533</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27903,27904</link.rule.ids></links><search><creatorcontrib>Tan, Chengli</creatorcontrib><creatorcontrib>Zhang, Jiangshe</creatorcontrib><creatorcontrib>Liu, Junmin</creatorcontrib><creatorcontrib>Zhao, Zixiang</creatorcontrib><title>Low-dimensional intrinsic dimension reveals a phase transition in gradient-based learning of deep neural networks</title><title>International journal of machine learning and cybernetics</title><addtitle>Int. J. Mach. Learn. & Cyber</addtitle><description>Deep neural networks complete a feature extraction task by propagating the inputs through multiple modules. However, how the representations evolve with the gradient-based optimization remains unknown. Here we leverage the intrinsic dimension of the representations to study the learning dynamics and find that the training process undergoes a phase transition from expansion to compression under disparate training regimes. Surprisingly, this phenomenon is ubiquitous across a wide variety of model architectures, optimizers, and data sets. We demonstrate that the variation in the intrinsic dimension is consistent with the complexity of the learned hypothesis, which can be quantitatively assessed by the critical sample ratio that is rooted in adversarial robustness. Meanwhile, we mathematically show that this phenomenon can be analyzed in terms of the mutable correlation between neurons. Although the evoked activities obey a power-law decaying rule in biological circuits, we identify that the power-law exponent of the representations in deep neural networks predicted adversarial robustness well only at the end of the training but not during the training process. These results together suggest that deep neural networks are prone to producing robust representations by adaptively eliminating or retaining redundancies. The code is publicly available at
https://github.com/cltan023/learning2022
.</description><subject>Artificial Intelligence</subject><subject>Artificial neural networks</subject><subject>Biological activity</subject><subject>Biological effects</subject><subject>Complex Systems</subject><subject>Computational Intelligence</subject><subject>Control</subject><subject>Datasets</subject><subject>Engineering</subject><subject>Hypotheses</subject><subject>Learning curves</subject><subject>Machine learning</subject><subject>Mechatronics</subject><subject>Neural networks</subject><subject>Original Article</subject><subject>Pattern Recognition</subject><subject>Phase transitions</subject><subject>Power law</subject><subject>Random variables</subject><subject>Representations</subject><subject>Robotics</subject><subject>Robustness</subject><subject>Systems Biology</subject><subject>Training</subject><issn>1868-8071</issn><issn>1868-808X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9UEtLAzEQDqJgqf0DngKeVyfJbpI9SvEFBS8K3kK6O1tT2-w22dr6701dqTcDQ2bme8B8hFwyuGYA6iYyATnPgOepeJ5n-xMyYlrqTIN-Oz32ip2TSYxLSE-CEMBHZDNrd1nt1uija71dUef74NJQ0eOWBvxEu4rU0u7dRqR9sAnoD5DzdBFs7dD32TxhNV2hDd75BW0bWiN21OM2JGOP_a4NH_GCnDXJDCe__5i83t-9TB-z2fPD0_R2llUcoM-ULJkq5xVKgbaReo660YIhomqg4FCXqqx5UamKsVJLyWxRqxILVsiC20KIMbkafLvQbrYYe7NstyGdGI1gLJfAQOWJxQdWFdoYAzamC25tw5dhYA7pmiFdk9I1P-mafRKJQRQT2S8w_Fn_o_oGqRl-6g</recordid><startdate>20241101</startdate><enddate>20241101</enddate><creator>Tan, Chengli</creator><creator>Zhang, Jiangshe</creator><creator>Liu, Junmin</creator><creator>Zhao, Zixiang</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>JQ2</scope></search><sort><creationdate>20241101</creationdate><title>Low-dimensional intrinsic dimension reveals a phase transition in gradient-based learning of deep neural networks</title><author>Tan, Chengli ; Zhang, Jiangshe ; Liu, Junmin ; Zhao, Zixiang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c200t-769179bce63eaf68be8f831eee7f0520d979d25c7c1198661a5d79e515652a533</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial Intelligence</topic><topic>Artificial neural networks</topic><topic>Biological activity</topic><topic>Biological effects</topic><topic>Complex Systems</topic><topic>Computational Intelligence</topic><topic>Control</topic><topic>Datasets</topic><topic>Engineering</topic><topic>Hypotheses</topic><topic>Learning curves</topic><topic>Machine learning</topic><topic>Mechatronics</topic><topic>Neural networks</topic><topic>Original Article</topic><topic>Pattern Recognition</topic><topic>Phase transitions</topic><topic>Power law</topic><topic>Random variables</topic><topic>Representations</topic><topic>Robotics</topic><topic>Robustness</topic><topic>Systems Biology</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Tan, Chengli</creatorcontrib><creatorcontrib>Zhang, Jiangshe</creatorcontrib><creatorcontrib>Liu, Junmin</creatorcontrib><creatorcontrib>Zhao, Zixiang</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Computer Science Collection</collection><jtitle>International journal of machine learning and cybernetics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Tan, Chengli</au><au>Zhang, Jiangshe</au><au>Liu, Junmin</au><au>Zhao, Zixiang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Low-dimensional intrinsic dimension reveals a phase transition in gradient-based learning of deep neural networks</atitle><jtitle>International journal of machine learning and cybernetics</jtitle><stitle>Int. J. Mach. Learn. & Cyber</stitle><date>2024-11-01</date><risdate>2024</risdate><volume>15</volume><issue>11</issue><spage>5381</spage><epage>5394</epage><pages>5381-5394</pages><issn>1868-8071</issn><eissn>1868-808X</eissn><abstract>Deep neural networks complete a feature extraction task by propagating the inputs through multiple modules. However, how the representations evolve with the gradient-based optimization remains unknown. Here we leverage the intrinsic dimension of the representations to study the learning dynamics and find that the training process undergoes a phase transition from expansion to compression under disparate training regimes. Surprisingly, this phenomenon is ubiquitous across a wide variety of model architectures, optimizers, and data sets. We demonstrate that the variation in the intrinsic dimension is consistent with the complexity of the learned hypothesis, which can be quantitatively assessed by the critical sample ratio that is rooted in adversarial robustness. Meanwhile, we mathematically show that this phenomenon can be analyzed in terms of the mutable correlation between neurons. Although the evoked activities obey a power-law decaying rule in biological circuits, we identify that the power-law exponent of the representations in deep neural networks predicted adversarial robustness well only at the end of the training but not during the training process. These results together suggest that deep neural networks are prone to producing robust representations by adaptively eliminating or retaining redundancies. The code is publicly available at
https://github.com/cltan023/learning2022
.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s13042-024-02244-x</doi><tpages>14</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1868-8071 |
ispartof | International journal of machine learning and cybernetics, 2024-11, Vol.15 (11), p.5381-5394 |
issn | 1868-8071 1868-808X |
language | eng |
recordid | cdi_proquest_journals_3114601074 |
source | Springer Link |
subjects | Artificial Intelligence Artificial neural networks Biological activity Biological effects Complex Systems Computational Intelligence Control Datasets Engineering Hypotheses Learning curves Machine learning Mechatronics Neural networks Original Article Pattern Recognition Phase transitions Power law Random variables Representations Robotics Robustness Systems Biology Training |
title | Low-dimensional intrinsic dimension reveals a phase transition in gradient-based learning of deep neural networks |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T04%3A20%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Low-dimensional%20intrinsic%20dimension%20reveals%20a%20phase%20transition%20in%20gradient-based%20learning%20of%20deep%20neural%20networks&rft.jtitle=International%20journal%20of%20machine%20learning%20and%20cybernetics&rft.au=Tan,%20Chengli&rft.date=2024-11-01&rft.volume=15&rft.issue=11&rft.spage=5381&rft.epage=5394&rft.pages=5381-5394&rft.issn=1868-8071&rft.eissn=1868-808X&rft_id=info:doi/10.1007/s13042-024-02244-x&rft_dat=%3Cproquest_cross%3E3114601074%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c200t-769179bce63eaf68be8f831eee7f0520d979d25c7c1198661a5d79e515652a533%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3114601074&rft_id=info:pmid/&rfr_iscdi=true |