Abstract
The expanding gap between microprocessor and DRAM performance has necessitated the use of increasingly aggressive techniques designed to reduce or hide the latency of main memory access. Although large cache hierarchies have proven to be effective in reducing this latency for the most frequently used data, it is still not uncommon for many programs to spend more than half their run times stalled on memory requests. Data prefetching has been proposed as a technique for hiding the access latency of data referencing patterns that defeat caching strategies. Rather than waiting for a cache miss to initiate a memory fetch, data prefetching anticipates such misses and issues a fetch to the memory system in advance of the actual memory reference. To be effective, prefetching must be implemented in such a way that prefetches are timely, useful, and introduce little overhead. Secondary effects such as cache pollution and increased memory bandwidth requirements must also be taken into consideration. Despite these obstacles, prefetching has the potential to significantly improve overall program execution time by overlapping computation with memory accesses. Prefetching strategies are diverse, and no single strategy has yet been proposed that provides optimal performance. The following survey examines several alternative approaches, and discusses the design tradeoffs involved when implementing a data prefetch strategy.
- ALEXANDER,T.AND KEDEM, G. 1996. Distributed prefetch-buffer/cache design for high performance memory systems. In Proceedings of 2nd IEEE Symposium on High-Performance Computer Architecture. IEEE Press, Piscataway, NJ, 254-263.]] Google ScholarDigital Library
- ANACKER,W.AND WANG, C. P. 1967. Performance evaluation of computing systems with memory hierarchies. IEEE Trans. Comput. 16,6, 764-773.]]Google ScholarCross Ref
- BAER, J.-L. AND CHEN, T.-F. 1991. An effective on-chip preloading scheme to reduce data access penalty. In Proceedings of the 1991 Conference on Supercomputing (Albuquerque, NM, Nov. 18-22), J. L. Martin, Chair. ACM Press, New York, NY, 176-186.]] Google ScholarDigital Library
- BERNSTEIN, D., COHEN, D., AND FREUND, A. 1995. Compiler techniques for data prefetching on the PowerPC. In Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques (PACT '95, Limassol, Cyprus, June 27-29), L. Bic, P. Evripidou, W. B~hm, and J.-L. Gaudiot, Chairs. IFIP Working Group on Algol, Manchester, UK, 19-26.]] Google ScholarDigital Library
- BURGER, D., GOODMAN,J.R.,AND KGI, A. 1997. Limited bandwidth to affect processor design. IEEE Micro 17, 6, 55-62.]] Google ScholarDigital Library
- CALLAHAN, D., KENNEDY, K., AND PORTERFIELD,A. 1991. Software prefetching. SIGARCH Comput. Arch. News 19, 2 (Apr.), 40-52.]] Google ScholarDigital Library
- CASMIRA,J.P.AND KAELI, D. R. 1995. Modeling cache pollution. In Proceedings of the Second IASTED Conference on Modeling and Simulation. 123-126.]]Google Scholar
- CHAN, K. K. 1996. Design of the HP PA 7200 CPU. Hewlett-Packard J. 47, 1, 25-33.]]Google Scholar
- CHANG,P.Y.,KAELI, D., AND LIU, Y. 1994. Branchdirected data cache prefetching. In Proceedings of Second Workshop on Shared-Memory Multiprocessing Systems.]]Google Scholar
- CHEN, T.-F. 1995. An effective programmable prefetch engine for on-chip caches. In Proceedings of the 28th Annual International Symposium on Microarchitecture (Ann Arbor, MI, Nov. 29 - Dec. 1), T. Mudge and K. Ebciogelu, Chairs. IEEE Computer Society Press, Los Alamitos, CA, 237-242.]] Google ScholarDigital Library
- CHEN, T.-F. AND BAER, J. L. 1994. A performance study of software and hardware data prefetching schemes. In Proceedings of 21st International Symposium on Computer Architecture (Chicago, IL, Apr.). 223-232.]] Google ScholarDigital Library
- CHEN, T.-F. AND BEAR, J. L. 1995. Effective hardware-based data prefetching for high-performance processors. IEEE Trans. Comput. 44,5 (May), 609-623.]] Google ScholarDigital Library
- CHEN,W.Y.,MAHLKE,S.A.,CHANG,P.P.,AND HWU, W.-M. W. 1991. Data access microarchitectures for superscalar processors with compiler- assisted data prefetching. In Proceedings of the 24th Annual International Symposium on Microarchitecture (MICRO 24, Albuquerque, NM, Nov. 18-20), Y. K. Malaiya, Chair. ACM Press, New York, NY, 69-73.]] Google ScholarDigital Library
- DAHLGREN,F.AND STENSTROM, P. 1995. Effectiveness of hardware-based stride and sequential prefetching in shared-memory multiprocessors. In Proceedings of 1st IEEE Symposium on High-Performance Computer Architecture (Raleigh, NC, Jan.). IEEE Press, Piscataway, NJ, 68-77.]] Google ScholarDigital Library
- DAHLGREN, F., DUBOIS, M., AND STENSTROM,P. 1993. Fixed and adaptive sequential prefetching in shared-memory multiprocessors. In Proceedings of the International Conference on Parallel Processing (St. Charles, IL). 56-63.]] Google ScholarDigital Library
- FU, B., SAINI, A., AND GELSINGER, P. P. 1989. Performance and microarchitecture of the i486 processor. In Proceedings of the IEEE International Conference on Computer Design (Cambridge, MA). 182-187.]]Google ScholarCross Ref
- FU,J.W.C.AND PATEL, J. H. 1991. Data prefetching in multiprocessor vector cache memories. In Proceedings of 18th International Symposium on Computer Architecture (Toronto, Ont., Canada). 54-63.]] Google ScholarDigital Library
- FU,J.W.C.,PATEL,J.H.,AND JANSSENS,B.L. 1992. Stride directed prefetching in scalar processors. In Proceedings of the 25th Annual International Symposium on Microarchitecture (MICRO 25, Portland, OR, Dec. 1-4), W.-m. Hwu, Chair. IEEE Computer Society Press, Los Alamitos, CA, 102-110.]] Google ScholarDigital Library
- GORNISH,E.H.AND VEIDENBAUM, A. V. 1994. An integrated hardware/software scheme for shared-memory multiprocessors. In Proceedings of International Conference on Parallel Processing (St. Charles, IL). 281-284.]] Google ScholarDigital Library
- GORNISH,E.H.,GRANSTON,E.D.,AND GRANSTON, A. V. 1990. Compiler-directed data prefetching in multiprocessors with memory hierarchies. In Proceedings of the 1990 ACM International Conference on Supercomputing (ICS '90, Amsterdam, The Netherlands, June 11-15), A. Sameh and H. van der Vorst, Chairs. ACM Press, New York, NY, 354-368.]] Google ScholarDigital Library
- HARRISON,L.AND MEHROTRA, S. 1994. A data prefetch mechanism for accelerating general computation. 1351. University of Illinois at Urbana-Champaign, Champaign, IL.]]Google Scholar
- JOSEPH,D.AND GRUNWALD, D. 1997. Prefetching using Markov predictors. In Proceedings of the 24th International Symposium on Computer Architecture (ISCA '97, Denver, CO, June 2-4), A. R. Pleszkun and T. Mudge, Chairs. ACM Press, New York, NY, 252-263.]] Google ScholarDigital Library
- JOUPPI, N. 1990. Improving direct-mapped cache performance by the addition of a small fullyassociative cache and prefetch buffers. In Proceedings of the 17th International Symposium on Computer Architecture (ISCA '90, Seattle, WA, May). IEEE Press, Piscataway, NJ, 364- 373.]] Google ScholarDigital Library
- KLAIBER,A.C.AND LEVY, H. M. 1991. An architecture for software-controlled data prefetching. SIGARCH Comput. Arch. News 19, 3 (May), 43-53.]] Google ScholarDigital Library
- KROFT, D. 1981. Lockup-free instruction fetch/ prefetch cache organization. In Proceedings of the 8th International Symposium on Computer Architecture (Minneapolis, MN, June). ACM Press, New York, NY, 81-85.]] Google ScholarDigital Library
- LILJA, D. J. 1993. Cache coherence in large-scale shared-memory multiprocessors: Issues and comparisons. ACM Comput. Surv. 25,3 (Sept.), 303-338.]] Google ScholarDigital Library
- LIPASTI,M.H.,SCHMIDT,W.J.,KUNKEL,S.R.,AND ROEDIGER, R. R. 1995. SPAID: Software prefetching in pointer- and call-intensive environments. In Proceedings of the 28th Annual International Symposium on Microarchitecture (Ann Arbor, MI, Nov. 29-Dec. 1), T. Mudge and K. Ebciogclu, Chairs. IEEE Com-puter Society Press, Los Alamitos, CA, 231- 236.]] Google ScholarDigital Library
- LUI,Y.AND KAELI, D. R. 1996. Branch-directed and stride-based data cache prefetching. In Proceedings of International Conference on Computer Design (ICCD'96, Austin, TX). IEEE Computer Society Press, Los Alamitos, CA, 255-230.]] Google ScholarDigital Library
- LUK, C.-K. AND MOWRY, T. C. 1996. Compilerbased prefetching for recursive data structures. ACM SIGOPS Oper. Syst. Rev. 30, 5, 222-233.]] Google ScholarDigital Library
- MOWRY,T.AND GUPTA, A. 1991. Tolerating latency through software-controlled prefetching in shared-memory multiprocessors. J. Parallel Distrib. Comput. 12, 2 (June), 87-106.]] Google ScholarDigital Library
- MOWRY,T.C.,LAM,M.S.,AND GUPTA, A. 1992. Design and evaluation of a compiler algorithm for prefetching. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-V, Boston, MA, Oct. 12-15), S. Eggers, Chair. ACM Press, New York, NY, 62-73.]] Google ScholarDigital Library
- OBERLIN, S., KESSLER, R., SCOTT, S., AND THORSON, G. 1996. Cray T3E architecture overview. Cray Supercomputers, Chippewa Falls, MN.]]Google Scholar
- PALACHARLA,S.AND KESSLER, R. E. 1994. Evaluating stream buffers as a secondary cache replacement. In Proceedings of 21st International Symposium on Computer Architecture (Chicago, IL, Apr.).]] Google ScholarDigital Library
- PATTERSON,R.H.AND GIBSON, G. A. 1994. Exposing I/O concurrency with informed prefetching. In Proceedings of the Third International Conference on Parallel and Distributed Information Systems (Austin, TX). 7-16.]] Google ScholarDigital Library
- PORTERFIELD, A. K. 1989. Software methods for improvement of cache performance on supercomputer applications. Ph.D thesis,. Ph.D. Dissertation. Rice University, Houston, TX.]] Google ScholarDigital Library
- PRZYBYLSKI, S. 1990. The performance impact of block sizes and fetch strategies. In Proceedings of the 17th International Symposium on Computer Architecture (Seattle, WA). 160- 169.]] Google ScholarDigital Library
- ROTH, A., MOSHOVOS, A., AND SOHI, G. 1998. Dependance based prefetching for linked data structures. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII, San Jose, CA, Oct. 3-7), D. Bhandarkar and A. Agarwal, Chairs. ACM Press, New York, NY.]] Google ScholarDigital Library
- SANTHANAM, V., GORNISH,E.H.,AND HSU, W.-C. 1997. Data prefetching on the HP PA-8000. In Proceedings of the 24th International Symposium on Computer Architecture (ISCA '97, Denver, CO, June 2-4), A. R. Pleszkun and T. Mudge, Chairs. ACM Press, New York, NY, 264-273.]] Google ScholarDigital Library
- SKLENAR, I. 1992. Prefetch unit for vector operations on scalar computers. In Proceedings of the 19th International Symposium on Computer Architecture (Gold Coast, Qld., Australia). 31-37.]] Google ScholarDigital Library
- SMITH, A. J. 1978. Sequential program prefetching in memory hierarchies. IEEE Computer 11, 12, 7-21.]]Google ScholarDigital Library
- SMITH, A. J. 1982. Cache memories. ACM Comput. Surv. 14, 3 (Sept.), 473-530.]] Google ScholarDigital Library
- SMITH, J. E. 1981. A study of branch prediction techniques. In Proceedings of the 8th International Symposium on Computer Architecture (Minneapolis, MN, June). ACM Press, New York, NY, 135-147.]] Google ScholarDigital Library
- VANDERWIEL,S.P.AND LILJA, D. J. 1999. A compiler-assisted data prefetch controller. In Proceedings of International Conference on Computer Design (ICCD '99, Austin TX).]] Google ScholarDigital Library
- VANDERWIEL,S.P.,HSU,W.C.,AND LILJA,D.J. 1997. When caches are not enough: Data prefetching techniques. IEEE Computer 30,7, 23-27.]] Google ScholarDigital Library
- YEAGER, K. C. 1996. The MIPS R10000 superscalar microprocessor. IEEE Micro 16, 2 (Apr.), 28-40.]] Google ScholarDigital Library
- YOUNG,H.C.AND SHEKITA, E. J. 1993. An intelligent I-cache prefetch mechanism. In Proceedings of the International Conference on Computer Design (ICCD'93, Cambridge, MA). IEEE Computer Society Press, Los Alamitos, CA, 44-49.]]Google ScholarCross Ref
- ZHANG,Z.AND TORRELLAS, J. 1995. Speeding up irregular applications in shared-memory multiprocessors. In Proceedings of the 22nd Annual International Symposium on Computer Architecture (ISCA '95, Santa Margherita Ligure, Italy, June 22-24), D. A. Patterson, Chair. ACM Press, New York, NY.]] Google ScholarDigital Library
Index Terms
Data prefetch mechanisms
Recommendations
Prefetch-Aware Memory Controllers
Existing DRAM controllers employ rigid, nonadaptive scheduling and buffer management policies when servicing prefetch requests. Some controllers treat prefetches the same as demand requests, and others always prioritize demands over prefetches. However, ...
Reducing memory latency using a small software driven array cache
HICSS '95: Proceedings of the 28th Hawaii International Conference on System SciencesFrom the programming viewpoint, data references can be classified into two types: array reference and non-array references. Array references have relatively strong spatial locality while non-array references have relatively strong temporal locality. ...
Comments