research-article

Near-side prefetch throttling: adaptive prefetching for high-performance many-core processors

Authors:
Wim Heirman

Intel Corporation

Intel Corporation
View Profile

,
Kristof Du Bois

Intel Corporation

Intel Corporation
View Profile

,
Yves Vandriessche

Intel Corporation

Intel Corporation
View Profile

,
Stijn Eyerman

Intel Corporation

Intel Corporation
View Profile

,
Ibrahim Hur

Intel Corporation

Intel Corporation
View Profile

PACT '18: Proceedings of the 27th International Conference on Parallel Architectures and Compilation TechniquesNovember 2018Article No.: 28Pages 1–11https://doi.org/10.1145/3243176.3243181

Published:01 November 2018Publication History

PACT '18: Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques

Pages 1–11

ABSTRACT

In modern processors, prefetching is an essential component for hiding long-latency memory accesses. However, prefetching too aggressively can easily degrade performance by evicting useful data from cache, or by saturating precious memory bandwidth. Tuning the prefetcher's activity is thus an important problem. Existing techniques tend to focus on detecting negative symptoms of aggressive prefetching, such as unused prefetches being evicted or memory bandwidth saturation, and throttle the prefetcher in response.

We argue that these far-side throttling techniques are inefficient because they require significant tracking state, and are reactive to negative effects rather than being proactive. We propose an alternative technique which we coin near-side throttling, which works by detecting late prefetches and tuning the prefetch distance to closely track the point at which most prefetches are not late. Because late prefetches are by definition useful, detecting late prefetches exclusively suffices to detect and prevent useless prefetches as well. Our solution is cheap to implement in hardware, includes throttling on off-chip bandwidth saturation, applies to both hardware and software prefetching, and can control multiple concurrent prefetchers where it will naturally allow the most useful prefetch algorithm to generate most of the requests. Through detailed simulation of a many-core architecture running a wide range of sequential and parallel applications, we show that our near-side throttling (NST) proposal performs similar to the state-of-the-art feedback-directed prefetching (FDP), even though it has a significantly lower implementation cost, can react more quickly to changes in application behavior and is applicable to a more varied set of use cases.

References

2016. APEX Application Benchmarks. http://www.lanl.gov/projects/apex/Google Scholar
2016. Intel Cafe. https://github.com/intel/caffeGoogle Scholar
2017. SPEC CPU2017 benchmark suite. https://www.spec.org/cpu2017/Google Scholar
Alaa R. Alameldeen and David A. Wood. 2007. Interactions Between Compression and Prefetching in Chip Multiprocessors. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA). 228--239. Google ScholarDigital Library
Jean-Loup Baer and Tien-Fu Chen. 1991. An Effective On-Chip Preloading Scheme to Reduce Data Access Penalty. In Proceedings of the ACM/IEEE Conference on Supercomputing. 176--186. Google ScholarDigital Library
Jean-Loup Baer and Tien-Fu Chen. 1995. Effective Hardware-Based Data Prefetching for High-Performance Processors. IEEE Trans. Comput. 44 (1995), 609--623. Google ScholarDigital Library
Trevor E. Carlson, Wim Heirman, and Lieven Eeckhout. 2011. Sniper: Exploring the Level of Abstraction for Scalable and Accurate Parallel Multi-Core Simulations. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC). 52:1--52:12. Google ScholarDigital Library
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 248--255.Google ScholarCross Ref
Eiman Ebrahimi, Chang Joo Lee, Onur Mutlu, and Yale N. Patt. 2011. Prefetch-Aware Shared Resource Management for Multi-Core Systems. In Proceedings of the International Symposium on Computer Architecture (ISCA). 141--152. Google ScholarDigital Library
Eiman Ebrahimi, Onur Mutlu, Chang Joo Lee, and Yale N. Patt. 2009. Coordinated Control of Multiple Prefetchers in Multi-Core Systems. In Proceedings of the International Symposium on Microarchitecture (MICRO). 316--326. Google ScholarDigital Library
Ibrahim Hur and Calvin Lin. 2006. Memory Prefetching Using Adaptive Stream Detection. In Proceedings of the International Symposium on Microarchitecture (MICRO). 397--408. Google ScholarDigital Library
Yasuo Ishii, Mary Inaba, and Kei Hiraki. 2011. Access Map Pattern Matching for High Performance Data Cache Prefetch. Journal of Instruction-Level Parallelism 13 (2011), 1--24.Google Scholar
Akanksha Jain and Calvin Lin. 2013. Linearizing Irregular Memory Accesses for Improved Correlated Prefetching. In Proceedings of the International Symposium on Microarchitecture (MICRO). 247--259. Google ScholarDigital Library
Victor Jiménez, Roberto Gioiosa, Francisco J. Cazorla, Alper Buyuktosunoglu, Pradip Bose, and Francis P. O'Connell. 2012. Making Data Prefetch Smarter: Adaptive Prefetching on POWER7. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT). 137--146. Google ScholarDigital Library
Norman P. Jouppi. 1990. Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers. In Proceedings of the International Symposium on Computer Architecture (ISCA). 364--373. Google ScholarDigital Library
Prathmesh Kallurkar and Smruti R. Sarangi. 2016. pTask: A Smart Prefetching Scheme for OS Intensive Applications. In Proceedings of the International Symposium on Microarchitecture (MICRO). 1--12. Google ScholarDigital Library
Jinchun Kim, Seth H. Pugsley, Paul V. Gratz, A. L. Narasimha Reddy, Chris Wilkerson, and Zeshan Chishti. 2016. Path Confidence Based Lookahead Prefetching. In Proceedings of the International Symposium on Microarchitecture (MICRO). 1--12. Google ScholarDigital Library
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the International Conference on Neural Information Processing Systems (NIPS). 1097--1105. Google ScholarDigital Library
Chang Joo Lee, Onur Mutlu, Veynu Narasiman, and Yale N. Patt. 2008. Prefetch-Aware DRAM Controllers. In Proceedings of the International Symposium on Microarchitecture (MICRO). 200--209. Google ScholarDigital Library
Shih-wei Liao, Tzu-Han Hung, Donald Nguyen, Chinyen Chou, Chiaheng Tu, and Hucheng Zhou. 2009. Machine Learning-Based Prefetch Optimization for Data Center Applications. In Proceedings of the International Conference on High Performance Computing Networking, Storage and Analysis (SC). 56:1--56:10. Google ScholarDigital Library
Wei-Fen Lin, Steven K. Reinhardt, and Doug Burger. 2001. Reducing DRAM Latencies with an Integrated Memory Hierarchy Design. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA). 301--312. Google ScholarDigital Library
Pierre Michaud. 2016. Best-Offset Hardware Prefetching. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA). 469--480.Google ScholarCross Ref
Kyle J. Nesbit and James E. Smith. 2005. Data Cache Prefetching Using a Global History Buffer. IEEE Micro 25, 1 (2005), 90--97. Google ScholarDigital Library
Biswabandan Panda. 2016. SPAC: A Synergistic Prefetcher Aggressiveness Controller for Multi-Core Systems. IEEE Trans. Comput. 65, 12 (Dec 2016), 3740--3753. Google ScholarDigital Library
Seth H. Pugsley, Zeshan Chishti, Chris Wilkerson, Peng-fei Chuang, Robert L. Scott, Aamer Jaleel, Shih-Lien Lu, Kingsum Chow, and Rajeev Balasubramonian. 2014. Sandbox Prefetching: Safe Run-time Evaluation of Aggressive Prefetchers. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA). 626--637.Google ScholarCross Ref
Andres Rodriguez. 2016. Training and Deploying Deep Learning Networks with Caffe* Optimized for Intel<sup>®</sup> Architecture. Intel Developer Zone.Google Scholar
Vivek Seshadri, Samihan Yedkar, Hongyi Xin, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2015. Mitigating Prefetcher-Caused Pollution Using Informed Caching Policies for Prefetched Blocks. ACM Transactions on Architecture and Code Optimization (TACO) 11, 4 (Jan. 2015), 51:1--51:22. Google ScholarDigital Library
Avinash Sodani. 2015. Knights Landing (KNL): 2nd Generation Intel<sup>®</sup> Xeon Phi Processor. In Hot Chips 27 Symposium.Google ScholarCross Ref
Santhosh Srinath, Onur Mutlu, Hyesoon Kim, and Yale N Patt. 2007. Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA). 63--74. Google ScholarDigital Library
Carole-Jean Wu, Aamer Jaleel, Margaret Martonosi, Simon C. Steely, Jr., and Joel Emer. 2011. PACMan: Prefetch-Aware Cache Management for High Performance Caching. In Proceedings of the International Symposium on Microarchitecture (MICRO). 442--453. Google ScholarDigital Library
Carole-Jean Wu and Margaret Martonosi. 2011. Characterization and Dynamic Mitigation of Intra-Application Cache Interference. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 2--11. Google ScholarDigital Library
Xiangyao Yu, Christopher J. Hughes, Nadathur Satish, and Srinivas Devadas. 2015. IMP: Indirect Memory Prefetcher. In Proceedings of the International Symposium on Microarchitecture (MICRO). 178--190. Google ScholarDigital Library

Recommendations

Prefetch throttling and data pinning for improving performance of shared caches
SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing

In this paper, we (i) quantify the impact of compiler-directed I/O prefetching on shared caches at I/O nodes. The experimental data collected shows that while I/O prefetching brings some benefits, its effectiveness reduces significantly as the number of ...
Read More
Reducing Cache Pollution via Dynamic Data Prefetch Filtering

In order to bridge the gap of the growing speed disparity between processors and their memory subsystems, aggressive prefetch mechanisms, either hardware-based or compiler-assisted, are employed to hide memory latencies. As the first-level cache gets ...
Read More
Expert Prefetch Prediction: An Expert Predicting the Usefulness of Hardware Prefetchers

Hardware prefetching improves system performance by hiding and tolerating the latencies of lower levels of cache and off-chip DRAM. An accurate prefetcher improves system performance whereas an inaccurate prefetcher can cause cache pollution and consume ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PACT '18: Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques
November 2018
494 pages
ISBN:9781450359863
DOI:10.1145/3243176
General Chair:
Skevos Evripidou
University of Cyprus, Cyprus
,
Program Chairs:
Per Stenström
Chalmers University of Technology, Sweden
,
Michael O'Boyle
University of Edinburgh, UK
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 November 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate121of471submissions,26%
Upcoming Conference
PACT '24

Sponsor:

sigarch

International Conference on Parallel Architectures and Compilation Techniques

October 14 - 16, 2024

Southern California , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 18
  Total Citations
  View Citations
- 349
  Total Downloads
- Downloads (Last 12 months)60
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Near-side prefetch throttling: adaptive prefetching for high-performance many-core processors

PACT '18: Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques

ABSTRACT

References

Cited By

Recommendations

Prefetch throttling and data pinning for improving performance of shared caches

Reducing Cache Pollution via Dynamic Data Prefetch Filtering

Expert Prefetch Prediction: An Expert Predicting the Usefulness of Hardware Prefetchers

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Near-side prefetch throttling: adaptive prefetching for high-performance many-core processors

PACT '18: Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques

ABSTRACT

References

Cited By

Recommendations

Prefetch throttling and data pinning for improving performance of shared caches

Reducing Cache Pollution via Dynamic Data Prefetch Filtering

Expert Prefetch Prediction: An Expert Predicting the Usefulness of Hardware Prefetchers

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media