skip to main content
10.1145/3243176.3243181acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

Near-side prefetch throttling: adaptive prefetching for high-performance many-core processors

Published:01 November 2018Publication History

ABSTRACT

In modern processors, prefetching is an essential component for hiding long-latency memory accesses. However, prefetching too aggressively can easily degrade performance by evicting useful data from cache, or by saturating precious memory bandwidth. Tuning the prefetcher's activity is thus an important problem. Existing techniques tend to focus on detecting negative symptoms of aggressive prefetching, such as unused prefetches being evicted or memory bandwidth saturation, and throttle the prefetcher in response.

We argue that these far-side throttling techniques are inefficient because they require significant tracking state, and are reactive to negative effects rather than being proactive. We propose an alternative technique which we coin near-side throttling, which works by detecting late prefetches and tuning the prefetch distance to closely track the point at which most prefetches are not late. Because late prefetches are by definition useful, detecting late prefetches exclusively suffices to detect and prevent useless prefetches as well. Our solution is cheap to implement in hardware, includes throttling on off-chip bandwidth saturation, applies to both hardware and software prefetching, and can control multiple concurrent prefetchers where it will naturally allow the most useful prefetch algorithm to generate most of the requests. Through detailed simulation of a many-core architecture running a wide range of sequential and parallel applications, we show that our near-side throttling (NST) proposal performs similar to the state-of-the-art feedback-directed prefetching (FDP), even though it has a significantly lower implementation cost, can react more quickly to changes in application behavior and is applicable to a more varied set of use cases.

References

  1. 2016. APEX Application Benchmarks. http://www.lanl.gov/projects/apex/Google ScholarGoogle Scholar
  2. 2016. Intel Cafe. https://github.com/intel/caffeGoogle ScholarGoogle Scholar
  3. 2017. SPEC CPU2017 benchmark suite. https://www.spec.org/cpu2017/Google ScholarGoogle Scholar
  4. Alaa R. Alameldeen and David A. Wood. 2007. Interactions Between Compression and Prefetching in Chip Multiprocessors. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA). 228--239. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jean-Loup Baer and Tien-Fu Chen. 1991. An Effective On-Chip Preloading Scheme to Reduce Data Access Penalty. In Proceedings of the ACM/IEEE Conference on Supercomputing. 176--186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jean-Loup Baer and Tien-Fu Chen. 1995. Effective Hardware-Based Data Prefetching for High-Performance Processors. IEEE Trans. Comput. 44 (1995), 609--623. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Trevor E. Carlson, Wim Heirman, and Lieven Eeckhout. 2011. Sniper: Exploring the Level of Abstraction for Scalable and Accurate Parallel Multi-Core Simulations. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC). 52:1--52:12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 248--255.Google ScholarGoogle ScholarCross RefCross Ref
  9. Eiman Ebrahimi, Chang Joo Lee, Onur Mutlu, and Yale N. Patt. 2011. Prefetch-Aware Shared Resource Management for Multi-Core Systems. In Proceedings of the International Symposium on Computer Architecture (ISCA). 141--152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Eiman Ebrahimi, Onur Mutlu, Chang Joo Lee, and Yale N. Patt. 2009. Coordinated Control of Multiple Prefetchers in Multi-Core Systems. In Proceedings of the International Symposium on Microarchitecture (MICRO). 316--326. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ibrahim Hur and Calvin Lin. 2006. Memory Prefetching Using Adaptive Stream Detection. In Proceedings of the International Symposium on Microarchitecture (MICRO). 397--408. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Yasuo Ishii, Mary Inaba, and Kei Hiraki. 2011. Access Map Pattern Matching for High Performance Data Cache Prefetch. Journal of Instruction-Level Parallelism 13 (2011), 1--24.Google ScholarGoogle Scholar
  13. Akanksha Jain and Calvin Lin. 2013. Linearizing Irregular Memory Accesses for Improved Correlated Prefetching. In Proceedings of the International Symposium on Microarchitecture (MICRO). 247--259. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Victor Jiménez, Roberto Gioiosa, Francisco J. Cazorla, Alper Buyuktosunoglu, Pradip Bose, and Francis P. O'Connell. 2012. Making Data Prefetch Smarter: Adaptive Prefetching on POWER7. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT). 137--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Norman P. Jouppi. 1990. Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers. In Proceedings of the International Symposium on Computer Architecture (ISCA). 364--373. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Prathmesh Kallurkar and Smruti R. Sarangi. 2016. pTask: A Smart Prefetching Scheme for OS Intensive Applications. In Proceedings of the International Symposium on Microarchitecture (MICRO). 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jinchun Kim, Seth H. Pugsley, Paul V. Gratz, A. L. Narasimha Reddy, Chris Wilkerson, and Zeshan Chishti. 2016. Path Confidence Based Lookahead Prefetching. In Proceedings of the International Symposium on Microarchitecture (MICRO). 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the International Conference on Neural Information Processing Systems (NIPS). 1097--1105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Chang Joo Lee, Onur Mutlu, Veynu Narasiman, and Yale N. Patt. 2008. Prefetch-Aware DRAM Controllers. In Proceedings of the International Symposium on Microarchitecture (MICRO). 200--209. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Shih-wei Liao, Tzu-Han Hung, Donald Nguyen, Chinyen Chou, Chiaheng Tu, and Hucheng Zhou. 2009. Machine Learning-Based Prefetch Optimization for Data Center Applications. In Proceedings of the International Conference on High Performance Computing Networking, Storage and Analysis (SC). 56:1--56:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Wei-Fen Lin, Steven K. Reinhardt, and Doug Burger. 2001. Reducing DRAM Latencies with an Integrated Memory Hierarchy Design. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA). 301--312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Pierre Michaud. 2016. Best-Offset Hardware Prefetching. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA). 469--480.Google ScholarGoogle ScholarCross RefCross Ref
  23. Kyle J. Nesbit and James E. Smith. 2005. Data Cache Prefetching Using a Global History Buffer. IEEE Micro 25, 1 (2005), 90--97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Biswabandan Panda. 2016. SPAC: A Synergistic Prefetcher Aggressiveness Controller for Multi-Core Systems. IEEE Trans. Comput. 65, 12 (Dec 2016), 3740--3753. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Seth H. Pugsley, Zeshan Chishti, Chris Wilkerson, Peng-fei Chuang, Robert L. Scott, Aamer Jaleel, Shih-Lien Lu, Kingsum Chow, and Rajeev Balasubramonian. 2014. Sandbox Prefetching: Safe Run-time Evaluation of Aggressive Prefetchers. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA). 626--637.Google ScholarGoogle ScholarCross RefCross Ref
  26. Andres Rodriguez. 2016. Training and Deploying Deep Learning Networks with Caffe* Optimized for Intel<sup>®</sup> Architecture. Intel Developer Zone.Google ScholarGoogle Scholar
  27. Vivek Seshadri, Samihan Yedkar, Hongyi Xin, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2015. Mitigating Prefetcher-Caused Pollution Using Informed Caching Policies for Prefetched Blocks. ACM Transactions on Architecture and Code Optimization (TACO) 11, 4 (Jan. 2015), 51:1--51:22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Avinash Sodani. 2015. Knights Landing (KNL): 2nd Generation Intel<sup>®</sup> Xeon Phi Processor. In Hot Chips 27 Symposium.Google ScholarGoogle ScholarCross RefCross Ref
  29. Santhosh Srinath, Onur Mutlu, Hyesoon Kim, and Yale N Patt. 2007. Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA). 63--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Carole-Jean Wu, Aamer Jaleel, Margaret Martonosi, Simon C. Steely, Jr., and Joel Emer. 2011. PACMan: Prefetch-Aware Cache Management for High Performance Caching. In Proceedings of the International Symposium on Microarchitecture (MICRO). 442--453. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Carole-Jean Wu and Margaret Martonosi. 2011. Characterization and Dynamic Mitigation of Intra-Application Cache Interference. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 2--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Xiangyao Yu, Christopher J. Hughes, Nadathur Satish, and Srinivas Devadas. 2015. IMP: Indirect Memory Prefetcher. In Proceedings of the International Symposium on Microarchitecture (MICRO). 178--190. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    PACT '18: Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques
    November 2018
    494 pages
    ISBN:9781450359863
    DOI:10.1145/3243176

    Copyright © 2018 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 1 November 2018

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate121of471submissions,26%

    Upcoming Conference

    PACT '24
    International Conference on Parallel Architectures and Compilation Techniques
    October 14 - 16, 2024
    Southern California , CA , USA

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader