Loading…

Scaling Up Quantization-Aware Neural Architecture Search for Efficient Deep Learning on the Edge

Neural Architecture Search (NAS) has become the de-facto approach for designing accurate and efficient networks for edge devices. Since models are typically quantized for edge deployment, recent work has investigated quantization-aware NAS (QA-NAS) to search for highly accurate and efficient quantiz...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2024-01
Main Authors:	Lu, Yao, Hiram Rayo Torres Rodriguez, Vogel, Sebastian, van de Waterlaat, Nick, Jancura, Pavol
Format:	Article
Language:	English
Subjects:	Searching Semantic segmentation
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Neural Architecture Search (NAS) has become the de-facto approach for designing accurate and efficient networks for edge devices. Since models are typically quantized for edge deployment, recent work has investigated quantization-aware NAS (QA-NAS) to search for highly accurate and efficient quantized models. However, existing QA-NAS approaches, particularly few-bit mixed-precision (FB-MP) methods, do not scale to larger tasks. Consequently, QA-NAS has mostly been limited to low-scale tasks and tiny networks. In this work, we present an approach to enable QA-NAS (INT8 and FB-MP) on large-scale tasks by leveraging the block-wise formulation introduced by block-wise NAS. We demonstrate strong results for the semantic segmentation task on the Cityscapes dataset, finding FB-MP models 33% smaller and INT8 models 17.6% faster than DeepLabV3 (INT8) without compromising task performance.
ISSN:	2331-8422
DOI:	10.48550/arxiv.2401.12350