Loading…

Combining Process-Based Cache Partitioning and Pollute Region Isolation to Improve Shared Last Level Cache Utilization on Multicore Systems

Shared last level cache has been widely used in modern multicore processors. However, uncontrolled cache sharing on multicore leads to more serious cache pollution than that on single-core processor. A process with weak locality can evict strong locality data sets that belong to other concurrent one...

Full description

Saved in:
Bibliographic Details
Main Authors: Tao Huang, Jing Wang, Xuetao Guan, Qi Zhong, Keyi Wang
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Shared last level cache has been widely used in modern multicore processors. However, uncontrolled cache sharing on multicore leads to more serious cache pollution than that on single-core processor. A process with weak locality can evict strong locality data sets that belong to other concurrent ones. Processes in multiprocessing environment always affect each other on multicore systems with shared last level cache. Prior approaches either partition shared cache in process level to reduce inter-process cache contention, or isolate the non-temporal memory accesses in order to accelerate single application execution. Process-based cache partitioning may make intra-process cache pollution more serious and have great impact on single process performance. In this work, we take an alternative view to explore physical page layout optimization by combining process-based cache partitioning and pollute region isolation for improving the shared last level cache utilization on multicore systems. Our proposed approach includes three steps. The first step determines the cache sizes of co-scheduled applications and the second step recognizes weak-locality regions of each application on different cache size configurations. Lastly, the third step customizes the physical page layout to partition cache space among concurrent processes and set up global pollute buffer for mapping pollute regions into a small slice of shared last level cache. Our approach is directly used in commercial multicore systems without any additional hardware requirement. Our experimental results show that in comparison with default Linux memory management scheme, our approach improves performance by 26.73% on average. Even compared to the process-based cache partitioning RapidMRC, our approach further eliminates the harmful effect of non-reusable data, and system performance is also improved by 5.63% on average.
ISSN:2324-898X
2324-9013
DOI:10.1109/TrustCom.2013.139