Loading…

Cluster-aware scheduling in multitasking GPUs

The streaming multiprocessor (SM) count in GPUs continues to increase to provide high computing power. To construct a scalable crossbar network that connects the SMs to the LLC slices and memory controllers, a cluster structure is exploited in GPUs where a group of SMs shares a network port. Unfortu...

Full description

Saved in:

Bibliographic Details
Published in:	Real-time systems 2024-03, Vol.60 (1), p.1-23
Main Authors:	Zhao, Xia, Wang, Huiquan, Huang, Anwen, Wang, Dongsheng, Zhang, Guangda
Format:	Article
Language:	English
Subjects:	Clusters Communications Engineering Computer Science Computer Systems Organization and Communication Networks Control Mechatronics Multiprocessing Multitasking Networks Performance and Reliability Robotics Scheduling Special Purpose and Application-Based Systems System on chip
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The streaming multiprocessor (SM) count in GPUs continues to increase to provide high computing power. To construct a scalable crossbar network that connects the SMs to the LLC slices and memory controllers, a cluster structure is exploited in GPUs where a group of SMs shares a network port. Unfortunately, current GPU spatial multitasking is unaware of this underlying network-on-chip infrastructure which poses the challenges and also the opportunities for the performance. In this paper, we observe that compared to the cluster-unaware multitasking, considering the cluster structure, the SM partition within a cluster and also the injecting policy of sharing the network port can bring significant performance improvement. Next, we propose a low-cost online profiling and scheduling policy that consists of two steps. The cluster-aware scheduling first determines the best SM partition within a cluster and then finds the proper injecting policy between the two co-executing applications. Both steps are achieved in online profiling which only incurs limited runtime overhead. The evaluation results show that for all workloads, our cluster-aware multitasking increases the system throughput by 12.9% on average (and up to 76.5%).
ISSN:	0922-6443 1573-1383
DOI:	10.1007/s11241-023-09409-x