Abstract
Translation Lookaside Buffers (TLBs) are commonly employed in modern processor designs and have considerable impact on overall system performance. A number of past works have studied TLB designs to lower access times and miss rates, specifically for uniprocessors. With the growing dominance of chip multiprocessors (CMPs), it is necessary to examine TLB performance in the context of parallel workloads.
This work is the first to present TLB prefetchers that exploit commonality in TLB miss patterns across cores in CMPs. We propose and evaluate two Inter-Core Cooperative (ICC) TLB prefetching mechanisms, assessing their effectiveness at eliminating TLB misses both individually and together. Our results show these approaches require at most modest hardware and can collectively eliminate 19% to 90% of data TLB (D-TLB) misses across the surveyed parallel workloads.
We also compare performance improvements across a range of hardware and software implementation possibilities. We find that while a fully-hardware implementation results in average performance improvements of 8-46% for a range of TLB sizes, a hardware/software approach yields improvements of 4-32%. Overall, our work shows that TLB prefetchers exploiting inter-core correlations can effectively eliminate TLB misses.
- T.Anderson et al. The Interaction of Architecture and Operating System Design., Intl. Symp. on Architecture Support for Programming Languages and Operating Systems, 1991. Google ScholarDigital Library
- A.Bhattacharjee and M.Martonosi. Characterizing the TLB Behavior of Emerging Parallel Workloads on Chip Multiprocessors. Intl. Conf. on Parallel Architectures and Compilation Techniques, 2009. Google ScholarDigital Library
- C.Bienia et al. The PARSEC Benchmark Suite: Characterization and Architectural Implications. Intl. Conf. on Parallel Architectures and Compilation Techniques, 2008. Google ScholarDigital Library
- J.B. Chen, A.Borg, and N.Jouppi. A Simulation Based Study of TLB Performance. Intl. Symp. on Computer Architecture, 1992. Google ScholarDigital Library
- T.Chen and J.Baer. Effective Hardware-based Data Prefetching for High-Performance Processors. IEEE Trans. on Computers, 1995. Google ScholarDigital Library
- D.Clark and J.Emer.Performance of the VAX-11/780 Translation Buffers: Simulation and Measurement. ACM Transactions on Computer Systems, 3(1), 1985. Google ScholarDigital Library
- F.Dahlgren, M.Dubois, and P.Stenstrom. Fixed and Adaptive Sequential Prefetching in Shared Memory Multiprocessors. Intl. Conf. on Parallel Processing, 1993. Google ScholarDigital Library
- H.Huck and H.Hays. Architectural Support for Translation Table Management in Large Address Space Machines. Intl. Symp. on Computer Architecture, 1993. Google ScholarDigital Library
- B.Jacob and T.Mudge. Software-Managed Address Translation. Intl. Symp. on High Performance Computer Architecture, 1997. Google ScholarDigital Library
- B.Jacob and T.Mudge. A Look at Several Memory Management Units: TLB-Refill, and Page Table Organizations. Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, 1998. Google ScholarDigital Library
- B.Jacob and T.Mudge. Virtual Memory in Contemporary Microprocessors. IEEE Micro, 1998. Google ScholarDigital Library
- D.Joseph and D.Grunwald. Prefetching using Markov Predictors. Intl. Symp. on Computer Architecture, 1997. Google ScholarDigital Library
- G.Kandiraju and A.Sivasubramaniam. Characterizing the d-TLB Behavior of SPEC CPU2000 Benchmarks. ACM SIGMETRICS Intl. Conf. on Measurement and Modeling of Computer Systems, 2002. Google ScholarDigital Library
- G.Kandiraju and A.Sivasubramaniam. Going the Distance for TLB Prefetching: An Application-Driven Study. Intl. Symp. on Computer Architecture, 2002. Google ScholarDigital Library
- M.Martin et al. Multifacet's General Execution-Driven Multiprocessor Simulator (GEMS) Toolset. Comp. Arch. News, 2005. Google ScholarDigital Library
- D.Nagle et al. Design Tradeoffs for Software Managed TLBs. Intl. Symp. on Computer Architecture, 1993. Google ScholarDigital Library
- X.Qui and M.Dubois. Options for Dynamic Address Translations in COMAs. Intl. Symp. on Comp. Arch., 1998. Google ScholarDigital Library
- M.Rosenblum et al. The Impact of Architectural Trends on Operating System Performance. ACM Transactions on Modeling and Computer Simulation, 1995.Google Scholar
- A.Saulsbury, F.Dahlgren, and P.Stenstrom. Recency-Based TLB Preloading.Intl. Symp. on Comp. Arch., 2000. Google ScholarDigital Library
- V.Srinivasan, E.Davidson, and G.Tyson. A Prefetch Taxonomy. IEEE Transaction on Computers, 53(2), 2004. Google ScholarDigital Library
- Sun. UltraSPARC III Cu User's Manual. 2004.Google Scholar
- M.Talluri. Use of Superpages and Subblocking in the Address Translation Hierarchy. PhD Thesis, Dept. of CS, Univ. of Wisc., 1995. Google ScholarDigital Library
- M.Talluri and M.Hill. Surpassing the TLB Performance of Superpages with Less Operating System Support. Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, 1994. Google ScholarDigital Library
- Virtutech.Simics for Multicore Software. 2007.Google Scholar
Index Terms
Inter-core cooperative TLB for chip multiprocessors
Recommendations
TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs
Translation Lookaside Buffers (TLBs) are critical to overall system performance. Much past research has addressed uniprocessor TLBs, lowering access times and miss rates. However, as Chip MultiProcessors (CMPs) become ubiquitous, TLB design and ...
Inter-core cooperative TLB for chip multiprocessors
ASPLOS XV: Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systemsTranslation Lookaside Buffers (TLBs) are commonly employed in modern processor designs and have considerable impact on overall system performance. A number of past works have studied TLB designs to lower access times and miss rates, specifically for ...
Inter-core cooperative TLB for chip multiprocessors
ASPLOS '10Translation Lookaside Buffers (TLBs) are commonly employed in modern processor designs and have considerable impact on overall system performance. A number of past works have studied TLB designs to lower access times and miss rates, specifically for ...
Comments