Loading…
TSPLIT: Fine-grained GPU Memory Management for Efficient DNN Training via Tensor Splitting
Since Deep Neural Networks (DNNs) are deeper and larger, performing DNNs training on existing accelerators (e.g., GPUs) is challenging due to their limited device memory capacity. Existing memory management systems reduce the mem-ory footprint via tensor offloading and recomputing. However, this coa...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Since Deep Neural Networks (DNNs) are deeper and larger, performing DNNs training on existing accelerators (e.g., GPUs) is challenging due to their limited device memory capacity. Existing memory management systems reduce the mem-ory footprint via tensor offloading and recomputing. However, this coarse-grained, one-tensor-at-a-time memory management often incurs high peak GPU memory usage and cannot fully utilize available hardware resources (e.g., PCIe). In this paper, we propose TSPLIT, a fine-grained DNN memory management system that breaks apart memory bottlenecks while maintaining the efficiency of DNNs training. TSPLIT achieves this by proposing a model-guided approach to holistically exploit the tensor-split and its joint optimization with out-of-core execution methods (via offload and recompute). We further provide an efficient implementation of TSPLIT with proposed splittable tensor abstraction, profiling-based planner, and optimized DNN runtime. Evaluations on 6 DNN models show that compared to vDNN and SuperNeurons, TSPLIT can achieve maximum model scale up to 10.5× and 3.1 x and throughput improved up to 4.7× and 2.7 × under the same memory over-subscription, respectively. |
---|---|
ISSN: | 2375-026X |
DOI: | 10.1109/ICDE53745.2022.00241 |