Loading…

An Information-Theoretic View of Generalization via Wasserstein Distance

We capitalize on the Wasserstein distance to obtain two information-theoretic bounds on the generalization error of learning algorithms. First, we specialize the Wasserstein distance into total variation, by using the discrete metric. In this case we derive a generalization bound and, from a strong...

Full description

Saved in:

Bibliographic Details
Main Authors:	Wang, Hao, Diaz, Mario, Santos Filho, Jose Candido S., Calmon, Flavio P.
Format:	Conference Proceeding
Language:	English
Subjects:	Convergence Gaussian noise Measurement Mutual information Sociology Training Upper bound
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	We capitalize on the Wasserstein distance to obtain two information-theoretic bounds on the generalization error of learning algorithms. First, we specialize the Wasserstein distance into total variation, by using the discrete metric. In this case we derive a generalization bound and, from a strong data-processing inequality, show how to narrow the bound by adding Gaussian noise to the output hypothesis. Second, we consider the Wasserstein distance under a generic metric. In this case we derive a generalization bound by exploiting the geometric nature of the Kantorovich-Rubinstein duality theorem. We illustrate the use of these bounds with examples. Our bounds can handle certain cases in which existing bounds via mutual information fail.
ISSN:	2157-8117
DOI:	10.1109/ISIT.2019.8849359