Loading…

Randomized block-coordinate adaptive algorithms for nonconvex optimization problems

Nonconvex optimization problems have always been one focus in deep learning, in which many fast adaptive algorithms based on momentum are applied. However, the full gradient computation of high-dimensional feature vector in the above tasks become prohibitive. To reduce the computation cost for optim...

Full description

Saved in:
Bibliographic Details
Published in:Engineering applications of artificial intelligence 2023-05, Vol.121, p.105968, Article 105968
Main Authors: Zhou, Yangfan, Huang, Kaizhu, Li, Jiang, Cheng, Cheng, Wang, Xuguang, Hussian, Amir, Liu, Xin
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Nonconvex optimization problems have always been one focus in deep learning, in which many fast adaptive algorithms based on momentum are applied. However, the full gradient computation of high-dimensional feature vector in the above tasks become prohibitive. To reduce the computation cost for optimizers on nonconvex optimization problems typically seen in deep learning, this work proposes a randomized block-coordinate adaptive optimization algorithm, named RAda, which randomly picks a block from the full coordinates of the parameter vector and then sparsely computes its gradient. We prove that RAda converges to a δ-accurate solution with the stochastic first-order complexity of O(1/δ2), where δ is the upper bound of the gradient’s square, under nonconvex cases. Experiments on public datasets including CIFAR-10, CIFAR-100, and Penn TreeBank, verify that RAda outperforms the other compared algorithms in terms of the computational cost.
ISSN:0952-1976
1873-6769
DOI:10.1016/j.engappai.2023.105968