Loading…

On Scheduling Early-exit Layers for Model Pipeline in 6G-based Edge Inference

When running edge intelligence applications with 6G networks, model pipeline effectively reduces inference latency via parallelizing layers across multiple edge devices. Today's edge inference systems usually employ static architecture of layers in pipeline parallelism but dynamically skip part...

Full description

Saved in:
Bibliographic Details
Published in:IEEE network 2024-12, p.1-1
Main Authors: Liu, Yuxiao, Han, Rui, Zhang, Qinglong, Hou, Haiting, Liu, Chi Harold, Chen, Lydia Y.
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:When running edge intelligence applications with 6G networks, model pipeline effectively reduces inference latency via parallelizing layers across multiple edge devices. Today's edge inference systems usually employ static architecture of layers in pipeline parallelism but dynamically skip part of layers in early-exit, which may significantly degrade system throughput. In this paper, we introduce DensePipe, an online layer scheduling approach that optimally allocates early-exit layers to edge devices to maximize their throughput in model pipeline. To this end, DensePipe profiles all network layers' skipping probabilities in early-exit. At run-time, DensePipe maximizes the pipeline throughput by balancing the processing of all unskipped layers among devices according to the current loads and device resource utilizations. We implement DensePipe with Transformer models and demonstrate its effectiveness against state-of-the-art pipeline methods. Comparative experiments show that DensePiple successfully finds the best devices for most of the layers and significantly improves through-put by 3.09x.
ISSN:0890-8044
DOI:10.1109/MNET.2024.3520555