Loading…

Gradient estimation of information measures in deep learning

Information measures including entropy and mutual information (MI) have been widely applied in deep learning. Despite the successes, exiting estimation methods suffer from either high variance or high bias. This may lead to unstable training or poor performance in deep learning. Since estimating inf...

Full description

Saved in:

Bibliographic Details
Published in:	Knowledge-based systems 2021-07, Vol.224, p.107046, Article 107046
Main Authors:	Wen, Liangjian, Bai, Haoli, He, Lirong, Zhou, Yiji, Zhou, Mingyuan, Xu, Zenglin
Format:	Article
Language:	English
Subjects:	Back propagation Cognitive tasks Deep learning Entropy Estimation Gradient estimation Lattice theory Mathematical models Mutual information Optimization Parameter estimation Score estimation
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Information measures including entropy and mutual information (MI) have been widely applied in deep learning. Despite the successes, exiting estimation methods suffer from either high variance or high bias. This may lead to unstable training or poor performance in deep learning. Since estimating information measures in themselves is very difficult, we explore an alternative appealing strategy, by directly estimating the gradients of information measures with respect to model parameters. We propose a general gradient estimation method for information measures based on the score estimation. In detail, we establish the Entropy Gradient Estimator (EGE) and the Mutual Information Gradient Estimator (MIGE) to estimate the gradient of entropy and mutual information with respect to model parameters, respectively. For dealing with the optimization of entropy and mutual information, we can directly plug in their gradient approximation with relevant parameters to enable stochastic backpropagation for stability and efficiency. Our proposed method exhibits higher accuracy and lower variance for gradient estimation of information measures. Extensive experiments on various deep learning tasks have demonstrated the superiority of our method.
ISSN:	0950-7051 1872-7409
DOI:	10.1016/j.knosys.2021.107046