Loading…

A Method on Searching Better Activation Functions

The success of artificial neural networks (ANNs) hinges greatly on the judicious selection of an activation function, introducing non-linearity into network and enabling them to model sophisticated relationships in data. However, the search of activation functions has largely relied on empirical kno...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2024-05
Main Authors:	Sun, Haoyuan, Wu, Zihao, Xia, Bo, Chang, Pu, Dong, Zibin, Yuan, Yifu, Chang, Yongzhe, Wang, Xueqian
Format:	Article
Language:	English
Subjects:	Artificial neural networks Boundary conditions Entropy Entropy (Information theory) Entropy of activation Large language models Methodology Neural networks Taylor series
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Sun, Haoyuan Wu, Zihao Xia, Bo Chang, Pu Dong, Zibin Yuan, Yifu Chang, Yongzhe Wang, Xueqian
description	The success of artificial neural networks (ANNs) hinges greatly on the judicious selection of an activation function, introducing non-linearity into network and enabling them to model sophisticated relationships in data. However, the search of activation functions has largely relied on empirical knowledge in the past, lacking theoretical guidance, which has hindered the identification of more effective activation functions. In this work, we offer a proper solution to such issue. Firstly, we theoretically demonstrate the existence of the worst activation function with boundary conditions (WAFBC) from the perspective of information entropy. Furthermore, inspired by the Taylor expansion form of information entropy functional, we propose the Entropy-based Activation Function Optimization (EAFO) methodology. EAFO methodology presents a novel perspective for designing static activation functions in deep neural networks and the potential of dynamically optimizing activation during iterative training. Utilizing EAFO methodology, we derive a novel activation function from ReLU, known as Correction Regularized ReLU (CRReLU). Experiments conducted with vision transformer and its variants on CIFAR-10, CIFAR-100 and ImageNet-1K datasets demonstrate the superiority of CRReLU over existing corrections of ReLU. Extensive empirical studies on task of large language model (LLM) fine-tuning, CRReLU exhibits superior performance compared to GELU, suggesting its broader potential for practical applications.
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3058328239</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3058328239</sourcerecordid><originalsourceid>FETCH-proquest_journals_30583282393</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mQwdFTwTS3JyE9RyM9TCE5NLErOyMxLV3BKLSlJLVJwTC7JLEssyQTKuZXmJYMYxTwMrGmJOcWpvFCam0HZzTXE2UO3oCi_sDS1uCQ-K7-0KA8oFW9sYGphbGRhZGxpTJwqAGILMxA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3058328239</pqid></control><display><type>article</type><title>A Method on Searching Better Activation Functions</title><source>Access via ProQuest (Open Access)</source><creator>Sun, Haoyuan ; Wu, Zihao ; Xia, Bo ; Chang, Pu ; Dong, Zibin ; Yuan, Yifu ; Chang, Yongzhe ; Wang, Xueqian</creator><creatorcontrib>Sun, Haoyuan ; Wu, Zihao ; Xia, Bo ; Chang, Pu ; Dong, Zibin ; Yuan, Yifu ; Chang, Yongzhe ; Wang, Xueqian</creatorcontrib><description>The success of artificial neural networks (ANNs) hinges greatly on the judicious selection of an activation function, introducing non-linearity into network and enabling them to model sophisticated relationships in data. However, the search of activation functions has largely relied on empirical knowledge in the past, lacking theoretical guidance, which has hindered the identification of more effective activation functions. In this work, we offer a proper solution to such issue. Firstly, we theoretically demonstrate the existence of the worst activation function with boundary conditions (WAFBC) from the perspective of information entropy. Furthermore, inspired by the Taylor expansion form of information entropy functional, we propose the Entropy-based Activation Function Optimization (EAFO) methodology. EAFO methodology presents a novel perspective for designing static activation functions in deep neural networks and the potential of dynamically optimizing activation during iterative training. Utilizing EAFO methodology, we derive a novel activation function from ReLU, known as Correction Regularized ReLU (CRReLU). Experiments conducted with vision transformer and its variants on CIFAR-10, CIFAR-100 and ImageNet-1K datasets demonstrate the superiority of CRReLU over existing corrections of ReLU. Extensive empirical studies on task of large language model (LLM) fine-tuning, CRReLU exhibits superior performance compared to GELU, suggesting its broader potential for practical applications.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Artificial neural networks ; Boundary conditions ; Entropy ; Entropy (Information theory) ; Entropy of activation ; Large language models ; Methodology ; Neural networks ; Taylor series</subject><ispartof>arXiv.org, 2024-05</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/3058328239?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Sun, Haoyuan</creatorcontrib><creatorcontrib>Wu, Zihao</creatorcontrib><creatorcontrib>Xia, Bo</creatorcontrib><creatorcontrib>Chang, Pu</creatorcontrib><creatorcontrib>Dong, Zibin</creatorcontrib><creatorcontrib>Yuan, Yifu</creatorcontrib><creatorcontrib>Chang, Yongzhe</creatorcontrib><creatorcontrib>Wang, Xueqian</creatorcontrib><title>A Method on Searching Better Activation Functions</title><title>arXiv.org</title><description>The success of artificial neural networks (ANNs) hinges greatly on the judicious selection of an activation function, introducing non-linearity into network and enabling them to model sophisticated relationships in data. However, the search of activation functions has largely relied on empirical knowledge in the past, lacking theoretical guidance, which has hindered the identification of more effective activation functions. In this work, we offer a proper solution to such issue. Firstly, we theoretically demonstrate the existence of the worst activation function with boundary conditions (WAFBC) from the perspective of information entropy. Furthermore, inspired by the Taylor expansion form of information entropy functional, we propose the Entropy-based Activation Function Optimization (EAFO) methodology. EAFO methodology presents a novel perspective for designing static activation functions in deep neural networks and the potential of dynamically optimizing activation during iterative training. Utilizing EAFO methodology, we derive a novel activation function from ReLU, known as Correction Regularized ReLU (CRReLU). Experiments conducted with vision transformer and its variants on CIFAR-10, CIFAR-100 and ImageNet-1K datasets demonstrate the superiority of CRReLU over existing corrections of ReLU. Extensive empirical studies on task of large language model (LLM) fine-tuning, CRReLU exhibits superior performance compared to GELU, suggesting its broader potential for practical applications.</description><subject>Artificial neural networks</subject><subject>Boundary conditions</subject><subject>Entropy</subject><subject>Entropy (Information theory)</subject><subject>Entropy of activation</subject><subject>Large language models</subject><subject>Methodology</subject><subject>Neural networks</subject><subject>Taylor series</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mQwdFTwTS3JyE9RyM9TCE5NLErOyMxLV3BKLSlJLVJwTC7JLEssyQTKuZXmJYMYxTwMrGmJOcWpvFCam0HZzTXE2UO3oCi_sDS1uCQ-K7-0KA8oFW9sYGphbGRhZGxpTJwqAGILMxA</recordid><startdate>20240522</startdate><enddate>20240522</enddate><creator>Sun, Haoyuan</creator><creator>Wu, Zihao</creator><creator>Xia, Bo</creator><creator>Chang, Pu</creator><creator>Dong, Zibin</creator><creator>Yuan, Yifu</creator><creator>Chang, Yongzhe</creator><creator>Wang, Xueqian</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240522</creationdate><title>A Method on Searching Better Activation Functions</title><author>Sun, Haoyuan ; Wu, Zihao ; Xia, Bo ; Chang, Pu ; Dong, Zibin ; Yuan, Yifu ; Chang, Yongzhe ; Wang, Xueqian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_30583282393</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial neural networks</topic><topic>Boundary conditions</topic><topic>Entropy</topic><topic>Entropy (Information theory)</topic><topic>Entropy of activation</topic><topic>Large language models</topic><topic>Methodology</topic><topic>Neural networks</topic><topic>Taylor series</topic><toplevel>online_resources</toplevel><creatorcontrib>Sun, Haoyuan</creatorcontrib><creatorcontrib>Wu, Zihao</creatorcontrib><creatorcontrib>Xia, Bo</creatorcontrib><creatorcontrib>Chang, Pu</creatorcontrib><creatorcontrib>Dong, Zibin</creatorcontrib><creatorcontrib>Yuan, Yifu</creatorcontrib><creatorcontrib>Chang, Yongzhe</creatorcontrib><creatorcontrib>Wang, Xueqian</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sun, Haoyuan</au><au>Wu, Zihao</au><au>Xia, Bo</au><au>Chang, Pu</au><au>Dong, Zibin</au><au>Yuan, Yifu</au><au>Chang, Yongzhe</au><au>Wang, Xueqian</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>A Method on Searching Better Activation Functions</atitle><jtitle>arXiv.org</jtitle><date>2024-05-22</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>The success of artificial neural networks (ANNs) hinges greatly on the judicious selection of an activation function, introducing non-linearity into network and enabling them to model sophisticated relationships in data. However, the search of activation functions has largely relied on empirical knowledge in the past, lacking theoretical guidance, which has hindered the identification of more effective activation functions. In this work, we offer a proper solution to such issue. Firstly, we theoretically demonstrate the existence of the worst activation function with boundary conditions (WAFBC) from the perspective of information entropy. Furthermore, inspired by the Taylor expansion form of information entropy functional, we propose the Entropy-based Activation Function Optimization (EAFO) methodology. EAFO methodology presents a novel perspective for designing static activation functions in deep neural networks and the potential of dynamically optimizing activation during iterative training. Utilizing EAFO methodology, we derive a novel activation function from ReLU, known as Correction Regularized ReLU (CRReLU). Experiments conducted with vision transformer and its variants on CIFAR-10, CIFAR-100 and ImageNet-1K datasets demonstrate the superiority of CRReLU over existing corrections of ReLU. Extensive empirical studies on task of large language model (LLM) fine-tuning, CRReLU exhibits superior performance compared to GELU, suggesting its broader potential for practical applications.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-05
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_3058328239
source	Access via ProQuest (Open Access)
subjects	Artificial neural networks Boundary conditions Entropy Entropy (Information theory) Entropy of activation Large language models Methodology Neural networks Taylor series
title	A Method on Searching Better Activation Functions
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T10%3A36%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=A%20Method%20on%20Searching%20Better%20Activation%20Functions&rft.jtitle=arXiv.org&rft.au=Sun,%20Haoyuan&rft.date=2024-05-22&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3058328239%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_30583282393%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3058328239&rft_id=info:pmid/&rfr_iscdi=true