Loading…

ps-CALR: Periodic-Shift Cosine Annealing Learning Rate for Deep Neural Networks

There Are Continued Efforts to Build on the Performance of Deep Learning (DL) Models in Various Fields of Application. Developing New DL Models Continues to Open Unprecedented Opportunities in Diverse Application Areas Despite the Enormous Resources Required. Generally, the Learning Mechanism of DL...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE access 2023, Vol.11, p.139171-139186
Main Authors:	Johnson, Olanrewaju Victor, Xinying, Chew, Khaw, Khai Wah, Lee, Ming Ha
Format:	Article
Language:	English
Subjects:	Adaptation models Annealing Artificial neural networks Convergence Cosine annealing Cost function Deep learning flat minima learning rate loss function Machine learning Multilayer perceptrons Neural networks Numerical models Optimization Optimization methods optimizers Schedules Training Tuning
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	There Are Continued Efforts to Build on the Performance of Deep Learning (DL) Models in Various Fields of Application. Developing New DL Models Continues to Open Unprecedented Opportunities in Diverse Application Areas Despite the Enormous Resources Required. Generally, the Learning Mechanism of DL Models Depends on the Term "Cost Function" (CF) or "Loss Function" (LF), and DL Models Require Varied Hyperparameter Settings and, Precisely, Parameters That Can Help the Model to Continually Minimize the Cost Function Until Faster Convergence, With Better Generalization Over the Data in the Loss Landscape, Is Assumed. The Learning Rate (LR) Update Seeks to Find the Optimal Solution for DL Models Through Relative Cost Function Minimization. Therefore, Selecting the Appropriate LR Is Essential to the Performance of DL Models. Despite Its Demonstration for Fast Model Convergence, the Existing Cosine Annealing LR Lacks Complete Loss Landscape Exploration of the Flat Minima, Hence Limiting Its Ability to Model Better Generalization. To Address This, the Paper Proposes a Period-Shift Cosine Annealing Learning Rate With Warm-up Epochs (Ps-CALR) to Perturb the LR Update. Six Publicly Available Datasets Were Used to Benchmark the Proposed LR Method by Experimenting With Custom DL (multilayer Perceptron and Convolutional Neural networks) and Pre-Trained DL Models. The Proposed Ps-CARL Enhances Model Generalization and Convergence, Pushing the Solution to Notably Better Performance Than Fixed LR and the Existing Cosine Annealing Method.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2023.3340719