Loading…

Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?

Despite recent progress made by self-supervised methods in representation learning with residual networks, they still underperform supervised learning on the ImageNet classification benchmark, limiting their applicability in performance-critical settings. Building on prior theoretical insights from...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2022-11
Main Authors: Tomasev, Nenad, Bica, Ioana, McWilliams, Brian, Buesing, Lars, Pascanu, Razvan, Blundell, Charles, Mitrovic, Jovana
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Despite recent progress made by self-supervised methods in representation learning with residual networks, they still underperform supervised learning on the ImageNet classification benchmark, limiting their applicability in performance-critical settings. Building on prior theoretical insights from ReLIC [Mitrovic et al., 2021], we include additional inductive biases into self-supervised learning. We propose a new self-supervised representation learning method, ReLICv2, which combines an explicit invariance loss with a contrastive objective over a varied set of appropriately constructed data views to avoid learning spurious correlations and obtain more informative representations. ReLICv2 achieves \(77.1\%\) top-\(1\) accuracy on ImageNet under linear evaluation on a ResNet50, thus improving the previous state-of-the-art by absolute \(+1.5\%\); on larger ResNet models, ReLICv2 achieves up to \(80.6\%\) outperforming previous self-supervised approaches with margins up to \(+2.3\%\). Most notably, ReLICv2 is the first unsupervised representation learning method to consistently outperform the supervised baseline in a like-for-like comparison over a range of ResNet architectures. Using ReLICv2, we also learn more robust and transferable representations that generalize better out-of-distribution than previous work, both on image classification and semantic segmentation. Finally, we show that despite using ResNet encoders, ReLICv2 is comparable to state-of-the-art self-supervised vision transformers.
ISSN:2331-8422