skip to main content
10.1145/1152779.1147354acmconferencesArticle/Chapter ViewAbstractPublication PagesmedeaConference Proceedingsconference-collections
Article

Data trace cache: an application specific cache architecture

Published: 17 September 2005 Publication History

Abstract

Benefits of advances in processor technology have long been held hostage to the widening processor-memory gap. Off-chip memory access latency is one of the most critical parameters limiting system performance. Caches have been used as a way of alleviating this problem by reducing the average memory access latency. The memory bottleneck assumes greater significance for high performance computer architectures with high data throughput requirements such as network processors.This paper addresses the memory bottleneck with the goal of minimizing off-chip memory demand and average memory access latency by proposing the use of small application specific compiler-visible data trace caches. We focus on tree data structures which are responsible for a significant component of the memory traffic in several applications. We have observed that tree accesses create a simple to characterize trace of memory references and propose a data trace cache design to exploit the locality of reference in these data traces.Our study reveals that data trace caches can reduce the total number of misses from 7% to 53% for accesses to rooted tree data structures as compared to a conventional cache for a variety of applications for small cache sizes (256 - 1024 bytes). Such caches are in keeping with the philosophy of victim caches, stream buffers, and pre-fetch buffers in that relatively small investments in silicon can realize substantive reduction in off-chip memory bandwidth demand.

References

[1]
Intel Itanium 2 Processor Hardware Developer's Manual, July 2002.
[2]
Intel IXP2800 Network Processor Hardware Reference Manual, November 2002.
[3]
C.-K. Luk and T. C. Mowry, "Automatic compiler-inserted prefetching for pointer-based applications." IEEE Transactions on Computers, vol. 48, no. 2, pp. 134--141, 1999.
[4]
T. C. Mowry, M. S. Lam, and A. Gupta, "Design and evaluation of a compiler algorithm for prefetching." in Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems(ASPLOS), 1992, pp. 62--73.
[5]
J. Kim, R. M. Rabbah, K. V. Palem, and W.-F. Wong, "Adaptive compiler directed prefetching for epic processors." in Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications(PDPTA), 2004, pp. 495--501.
[6]
J. Kim, K. V. Palem, and W.-F. Wong, "A framework for data prefetching using off-line training of markovian predictors." in Proceedings of the 20th International Conference on Computer Design (ICCD), VLSI in Computers and Processors, September 2002, pp. 340--347.
[7]
S. Jiang and X. Zhang, "LIRS: An efficient low inter-reference recency set replacement policy to improve buffer cache performance." in Proceedings of the International Conference on Measurements and Modeling of Computer Systems, SIGMETRICS, June 2002, pp. 31--42.
[8]
K. Hazelwood, M. C. Toburen, and T. M. Conte, "A case for exploiting memory-access persistence," in "Workshop on Memory Performance Issues, June 2001.
[9]
T. M. Chilimbi, M. D. Hill, and J. R. Larus, "Cache-conscious structure layout." in Proceedings of the 1999 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 1999, pp. 1--12.
[10]
R. M. Rabbah and K. V. Palem, "Data remapping for design space optimization of embedded memory systems." ACM Transactions in Embedded Computing Systems, vol. 2, no. 2, pp. 186--218, 2003.
[11]
P. C., "Locality and route caches," in Proceedings of the NSF Workshop on Internet Statistics Measurement and Analysis, February 1996.
[12]
T. cker Chiueh and P. Pradhan, "High performance routing table lookup using CPU caching," in Proceedings IEEE INFOCOM, The Conference on Computer Communications, Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies, vol. 3, March 1999, pp. 1421--1428. {Online}. Available: citeseer.ist.psu.edu/article/chiueh99highperformance.html
[13]
K. Gopalan and T. cker Chiueh, "Improving route lookup performance using network processor cache." in Proceedings of the 2002 ACM/IEEE conference on Supercomputing, November 2002, pp. 1--10.
[14]
J.-L. Baer, D. Low, P. Crowley, and N. Sidhwaney, "Memory hierarchy design for a multiprocessor look-up engine." in Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), September 2003, pp. 206--216.
[15]
P. Gupta and N. McKeown, "Packet classification on multiple fields." in Proceedings of the ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication(SIGCOMM), 1999, pp. 147--160.
[16]
V. Srinivasan, S. Suri, and G. Varghese, "Packet classification using tuple space search." in Proceedings of the ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, 1999, pp. 135--146.
[17]
T. Wolf and M. Franklin, "Commbench --- a telecommunications benchmark for network processors." in Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, April 2000, pp. 154--162. {Online}. Available: citeseer.ist.psu.edu/wolf00commbench.html
[18]
"The Stony Brook Algorithm Repository." {Online}. Available: http://www.cs.sunysb.edu/~algorith/
[19]
"Valgrind tool suite version - 2.1.2." {Online}. Available: http://www.valgrind.org
[20]
"Dinero IV Trace-Driven Uniprocessor Cache Simulator." {Online}. Available: http://www.cs.wisc.edu/~markhill/DineroIV
[21]
"CACTI, HP-Compaq Western Research Lab." {Online}. Available: http://research.compaq.com/wrl/people/jouppi/CACTI.html
[22]
I. Stoica, "Stateless core: A scalable approach for quality of service in the internet," 2001, doctoral Dissertation.

Cited By

View all
  • (2007)Customized placement for high performance embedded processor cachesProceedings of the 20th international conference on Architecture of computing systems10.5555/1763274.1763280(69-82)Online publication date: 12-Mar-2007
  • (2007)Customized Placement for High Performance Embedded Processor CachesArchitecture of Computing Systems - ARCS 200710.1007/978-3-540-71270-1_6(69-82)Online publication date: 2007

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MEDEA '05: Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture
September 2005
76 pages
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 34, Issue 1
    Special issue: MEDEA'05
    March 2006
    86 pages
    ISSN:0163-5964
    DOI:10.1145/1147349
    Issue’s Table of Contents

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 17 September 2005

Check for updates

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 6 of 9 submissions, 67%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)1
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2007)Customized placement for high performance embedded processor cachesProceedings of the 20th international conference on Architecture of computing systems10.5555/1763274.1763280(69-82)Online publication date: 12-Mar-2007
  • (2007)Customized Placement for High Performance Embedded Processor CachesArchitecture of Computing Systems - ARCS 200710.1007/978-3-540-71270-1_6(69-82)Online publication date: 2007

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media