Loading…

PSACS: Highly-Parallel Shuffle Accelerator on Computational Storage

Shuffle is an indispensable process in distributed online analytical processing systems to enable task-level parallelism exploitation via multiple nodes. As a data-intensive data reorganization process, shuffle implemented on general-purpose CPUs not only incurs data traffic back and forth between t...

Full description

Saved in:
Bibliographic Details
Main Authors: Zou, Chen, Zhang, Hui, Chien, Andrew A., Seok Ki, Yang
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Shuffle is an indispensable process in distributed online analytical processing systems to enable task-level parallelism exploitation via multiple nodes. As a data-intensive data reorganization process, shuffle implemented on general-purpose CPUs not only incurs data traffic back and forth between the computing and storage resources, but also pollutes the cache hierarchy with almost zero data reuse. As a result, shuffle can easily become the bottleneck of distributed analysis pipelines.Our PSACS approach attacks these bottlenecks with the rising computational storage paradigm. Shuffle is offloaded to the storage-side PSACS accelerator to avoid polluting computing node memory hierarchy and enjoy the latency, bandwidth and energy benefits of near-data computing. Further, the microarchitecture of PSACS exploits data-, subtask-, and task-level parallelism for high performance and a customized scratchpad for fast on-chip random access.PSACS achieves 4.6x-5.7x shuffle throughput at kernel-level and up to 1.3x overall shuffle throughput with only a twentieth of CPU utilization comparing to software baselines. These mount up to 23% end-to-end OLAP query speedup on average.
ISSN:2576-6996
DOI:10.1109/ICCD53106.2021.00080