Loading…

ICCI: In-Cache Coherence Information

In this paper we introduce ICCI, a new cache organization that leverages shared cache resources and flat coherence protocols to provide inexpensive hardware cache coherence for large core counts (e.g., 512), achieving execution times close to a nonscalable sparse directory while noticeably reducing...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on computers 2015-04, Vol.64 (4), p.995-1014
Main Authors: Garcia-Guirado, Antonio, Fernandez-Pascual, Ricardo, Garcia, Jose M.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In this paper we introduce ICCI, a new cache organization that leverages shared cache resources and flat coherence protocols to provide inexpensive hardware cache coherence for large core counts (e.g., 512), achieving execution times close to a nonscalable sparse directory while noticeably reducing the energy consumption of the memory system. Very simple changes in the system with respect to traditional bit-vector directories are enough to implement ICCI. Moreover, ICCI does not introduce any storage overhead with respect to a broadcast-based protocol, yet it provides large storage space for coherence information. ICCI makes smarter use of cache resources by dynamically allowing last-level cache entries to store blocks or sharing codes. This way, just the minimum number of directory entries required at runtime are allocated. Besides, ICCI suffers a negligible amount of directory-induced invalidations. Results for a 512-core CMP show that ICCI reduces the energy consumption of the memory system by up to 48 percent compared to a tag-embedded directory, up to 15 percent compared to a sparse directory, and up to 8 percent compared to the state-of-the-art Scalable Coherence Directory which ICCI also outperforms in execution time. In addition, ICCI can be used in combination with elaborated sharing codes to apply it to extremely large core counts. We also show analytically that ICCI's dynamic allocation of entries makes it a suitable candidate to store coherence information efficiently for very large core counts (e.g., over 200K cores), based on the observation that data sharing makes fewer directory entries necessary per core as core count increases.
ISSN:0018-9340
1557-9956
DOI:10.1109/TC.2014.2308185