Loading…

Layer-Wisely Supervised Learning For One-Shot Neural Architecture Search

Neural architecture search aims to automatically discover both efficient and effective neural architectures. Recently, one-shot neural architecture search (one-shot NAS) has drawn great attention due to its high efficiency and competitive performance. One of the most important problems in one-shot N...

Full description

Saved in:

Bibliographic Details
Main Authors:	Chen, Yifei, Guo, Zhourui, Yin, Qiyue, Chen, Hao, Huang, Kaiqi
Format:	Conference Proceeding
Language:	English
Subjects:	Convergence Correlation Deep Learning Evolutionary computation Layer-Wise Pretraining Neural Architecture Search Neural networks Proposals Supervised learning Training
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Neural architecture search aims to automatically discover both efficient and effective neural architectures. Recently, one-shot neural architecture search (one-shot NAS) has drawn great attention due to its high efficiency and competitive performance. One of the most important problems in one-shot NAS is to evaluate the capabilities of architecture candidates. In particular, a pre-trained super-net is served as an evaluator. Due to the large weight-sharing space, current one-shot methods suffer from the ranking disorder issue, that is, the ranking correlation between estimated capabilities and true capabilities of candidates is incorrect. Moreover, the super-net in search is dense thus it is inefficient to train with end-to-end back-propagation. In this paper, we propose to modularize the large weight-sharing space of one-shot NAS into layers by introducing layer-wisely supervised learning. But we discover that greedy layer-wise learning that learns each layer separately with a local objective hurts super-net performance as well as ranking correlation. Instead, we learn each layer by using the gradients propagated from the objective associated with the adjacent upper layer. The simple proposal reduces the representation shift and improves the ranking correlation. In addition, it reduces 47.4% memory footprint and gets a faster convergence of super-net training compared with the strong baseline. Extensive experiments on ImageNet with both supervised and self-supervised objectives demonstrate the effectiveness of our proposal.
ISSN:	2161-4407
DOI:	10.1109/IJCNN55064.2022.9892632