Loading…

Stochastic Gradient Descent for Kernel-Based Maximum Correntropy Criterion

Maximum correntropy criterion (MCC) has been an important method in machine learning and signal processing communities since it was successfully applied in various non-Gaussian noise scenarios. In comparison with the classical least squares method (LS), which takes only the second-order moment of mo...

Full description

Saved in:

Bibliographic Details
Published in:	Entropy (Basel, Switzerland) Switzerland), 2024-12, Vol.26 (12), p.1104
Main Authors:	Li, Tiankai, Wang, Baobin, Peng, Chaoquan, Yin, Hong
Format:	Article
Language:	English
Subjects:	Algorithms Convergence convergence rate Convexity Criteria Data analysis Gaussian process Least squares method Machine learning maximum correntropy criterion non-Gaussian Optimization Outliers (statistics) Random noise Random variables Robustness Signal processing stochastic gradient descent
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Maximum correntropy criterion (MCC) has been an important method in machine learning and signal processing communities since it was successfully applied in various non-Gaussian noise scenarios. In comparison with the classical least squares method (LS), which takes only the second-order moment of models into consideration and belongs to the convex optimization problem, MCC captures the high-order information of models that play crucial roles in robust learning, which is usually accompanied by solving the non-convexity optimization problems. As we know, the theoretical research on convex optimizations has made significant achievements, while theoretical understandings of non-convex optimization are still far from mature. Motivated by the popularity of the stochastic gradient descent (SGD) for solving nonconvex problems, this paper considers SGD applied to the kernel version of MCC, which has been shown to be robust to outliers and non-Gaussian data in nonlinear structure models. As the existing theoretical results for the SGD algorithm applied to the kernel MCC are not well established, we present the rigorous analysis for the convergence behaviors and provide explicit convergence rates under some standard conditions. Our work can fill the gap between optimization process and convergence during the iterations: the iterates need to converge to the global minimizer while the obtained estimator cannot ensure the global optimality in the learning process.
ISSN:	1099-4300 1099-4300
DOI:	10.3390/e26121104