Loading…
Trans-FW: Short Circuiting Page Table Walk in Multi-GPU Systems via Remote Forwarding
Multi-GPU systems have become a popular platform to meet the ever-growing application demands. However, employing multiple GPUs does not guarantee proportional performance improvements. While prior works have extensively studied the optimizations to mitigate the non-uniform memory accesses (NUMA) ov...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Multi-GPU systems have become a popular platform to meet the ever-growing application demands. However, employing multiple GPUs does not guarantee proportional performance improvements. While prior works have extensively studied the optimizations to mitigate the non-uniform memory accesses (NUMA) overheads, the address translation process also plays an important role in shaping the overall execution performance. In this paper, we investigate the address translation process in multi-GPU systems under unified virtual memory (UVM). We specifically focus on the efficiency of page table walk and identify three major latency penalties: i) queuing for available page table walk threads, ii) memory accesses for page walk cache misses, and iii) handling page faults. Based on our observations, we propose Trans-FW, which short circuits the page table walk by leveraging substantial translation sharing and eager remote translation forwarding. Experimental results on 10 representative multi-GPU applications show that our proposed approach improves the overall performance by 53.8% on average. |
---|---|
ISSN: | 2378-203X |
DOI: | 10.1109/HPCA56546.2023.10071054 |