Article

I-cache multi-banking and vertical interleaving

Author:

Sangyeun ChoAuthors Info & Claims

GLSVLSI '07: Proceedings of the 17th ACM Great Lakes symposium on VLSI

Pages 14 - 19

https://doi.org/10.1145/1228784.1228794

Published: 11 March 2007 Publication History

Abstract

This research investigates the impact of a microarchitectural technique called vertical interleaving in multi-banked caches. Unlike previous multi-banking and interleaving techniques to increase cache bandwidth, the proposed vertical interleaving further divides memory banks in a cache into vertically arranged sub-banks, which are selectively accessed based on the memory address. Under this setting, we are particularly interested in how accesses to instruction cache are dispersed toward different cache banks. We quantitatively analyze the memory access pattern seen by each cache bank and establish the relationship between important cache parameters and the access patterns. Our study shows that the vertical interleaving technique distributes accesses among different banks with tightly bounded run lengths. We then discuss possible applications that utilize the presented concept, including power density reduction. Very simple interleaving configurations can lead to as much as 67% reduction of maximum power density under a realistic machine configuration. Our study suggests that the idea of vertically interleaving cache lines has potential for optimizing memory accesses in a number of interesting ways.

References

[1]

F. A. Briggs and E. S. Davidson. "Organization of Semiconductor Memories for Parallel-Pipelined Processors," IEEE Trans. Computers, C(26):152--169, Feb. 1977.

[2]

D. Brooks and M. Martonosi. "Dynamic Thermal Management for High-Performance Microprocessors," Proc. Int'l Symp. High-Performance Computer Architecture, pp. 171--182, Jan. 2001.

Digital Library

[3]

D. Burger and T. M. Austin. "The SimpleScalar Tool Set, Version 2.0," Computer Sciences Dept., TR 1342, Univ. of Wisconsin, June 1997.

[4]

S. Cho, P.-C. Yew, and G. Lee. "Decoupling Local Variable Accesses in a Wide-Issue Superscalar Processor," Proc. Int'l Symp. Computer Architecture, pp. 100--110, May 1999.

Digital Library

[5]

G. F. Grohoski. "Machine organization of the IBM RISC System/6000 processor," IBM J. R & D, 34(1):37--58, Jan. 1990.

Digital Library

[6]

K. Hwang. Advanced Computer Architecture: Parallelism, Scalability, Programmability, McGraw-Hill, 1993.

Digital Library

[7]

J. K. John, J. S. Hu, and S. G. Ziavras. "Optimizing the Thermal Behavior of Subarrayed Data Caches," Proc. Int'l Conf. Computer Design, pp. 625--630, Oct. 2005.

Digital Library

[8]

J. C. Ku, S. Ozdemir, G. Memik, and Y. Ismail. "Thermal Management of On-Chip Caches Through Power Density Minimization," Proc. Int'l Symp. Microarch., pp. 283--293, Dec. 2005.

Digital Library

[9]

S. Kumar, C. Kim, and S. Sapatnekar. "Impact of NBTI on SRAM Read Stability and Design for Reliability," Proc. Int'l Symp. Quality Electronics Design, pp. 210--218, Mar. 2006.

Digital Library

[10]

M. H. Lipasti, C. B. Wilkerson, and J. P. Shen. "Value Locality and Load Value Prediction," Proc. Int'l Conf. Architectural Support for Prog. Languages and Operating Systems, pp. 138--147, Oct. 1996.

Digital Library

[11]

M. Mamidipaka and N. Dutt. "eCACTI: An Enhanced Power Estimation Model for On-chip Caches," CECS TR 04--28, UC Irvine, Sep. 2004.

[12]

A. Moshovos and G. S. Sohi. "Streamlining Inter-Operation Memory Communication via Data Dependence Prediction," Proc. Int'l Symp. Microarchitecture, pp. 235--245, Dec. 1997.

Digital Library

[13]

S. S. Mukherjee, J. Emer, T. Fossum, and S. K. Reinhardt. "Cache Scrubbing in Microprocessors: Myth or Necessity? Proc. Pacific Rim Int'l Symp. Dependable Computing, pp. 37--42, Mar. 2003.

Digital Library

[14]

J. A. Rivers, G. S. Tyson, E. S. Davidson, and T. M. Austin. "On High-Bandwidth Data Cache Design for Multi-Issue Processors," Proc. Int'l Symp. Microarchitecture, pp. 46--56, Dec. 1997.

Digital Library

[15]

R. Ronen et al. "Coming Challenges in Microarchitecture and Architecture," Proc. IEEE, 89(3):325--340, Mar. 2001.

[16]

K. Skadron, M. R. Stan, W. Huang, S. Velusamy, K. Sankaranarayanan, and D. Tarjan. "Temperature-Aware Microarchitecture," Proc. Int'l Symp. Computer Architecture, pp. 2--13, June 2003.

Digital Library

[17]

A. J. Smith. "Cache Memories," ACM Computing Surveys, 14(3):473--530, Sep. 1982.

Digital Library

[18]

G. S. Sohi and M. Franklin. "High-Bandwidth Data Memory Systems for Superscalar Processors," Proc. Int'l Conf. on Architectural Support for Programming Language and Operating Systems, pp. 53--62, Apr. 1991.

Digital Library

[19]

Standard Performance Evaluation Corporation. http://www.specbench.org.

[20]

C.-L. Su and A. M. Despain. "Cache Designs for Energy Efficiency," Proc. Hawaii Int'l Conf. System Sciences, pp. 306--315, Jan. 1995.

Digital Library

[21]

T. Wada, S. Rajan, and S. A. Przybylski. "An Analytical Access Time Model for On-Chip Cache Memories," IEEE J. Solid-State Circuits, 27(8):1147--1156, Aug. 1992.

[22]

K. C. Yeager. "The MIPS R10000 Superscalar Microprocessor," IEEE Micro, 16(2):28--40, Apr. 1996.

Digital Library

Cited By

Bazzaz MHoseinghorban AEjlali A(2021)Fast and Predictable Non-Volatile Data Memory for Real-Time Embedded SystemsIEEE Transactions on Computers10.1109/TC.2020.298826170:3(359-371)Online publication date: 1-Mar-2021
https://dl.acm.org/doi/10.1109/TC.2020.2988261
Hoseinghorban ABazzaz MEjlali A(2018)Fast write operations in non-volatile memories using latency masking2018 Real-Time and Embedded Systems and Technologies (RTEST)10.1109/RTEST.2018.8397072(1-7)Online publication date: May-2018
https://doi.org/10.1109/RTEST.2018.8397072

Index Terms

I-cache multi-banking and vertical interleaving
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory
  2. Very large scale integration design

Recommendations

CPU Cache Prefetching: Timing Evaluation of Hardware Implementations

Prefetching into CPU caches has long been known to be effective in reducing the cache miss ratio, but known implementations of prefetching have been unsuccessful in improving CPU performance. The reasons for this are that prefetches interfere with ...
Minimizing Area Cost of On-Chip Cache Memories by Caching Address Tags

This paper presents a technique for minimizing chip-area cost of implementing an on-chip cache memory of microprocessors. The main idea of the technique is Caching Address Tags, or CAT cache, for short. The CAT cache exploits locality property that ...
Line (block) size choice for CPU cache memories

The line (block) size of a cache memory is one of the parameters that most strongly affects cache performance. In this paper, we study the factors that relate to the selection of a cache line size. Our primary focus is on the cache miss ratio, but we ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

GLSVLSI '07: Proceedings of the 17th ACM Great Lakes symposium on VLSI

March 2007

626 pages

ISBN:9781595936059

DOI:10.1145/1228784

General Chairs:
Hai Zhou
Northwestern University
,
Enrico Macii
Politecnico di Torino
,
Program Chairs:
Zhiyuan Yan
Lehigh University
,
Yehia Massoud
Rice University

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 March 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

GLSVLSI07

Sponsor:

GLSVLSI07: Great Lakes Symposium on VLSI 2007

March 11 - 13, 2007

Stresa-Lago Maggiore, Italy

Acceptance Rates

Overall Acceptance Rate 312 of 1,156 submissions, 27%

Upcoming Conference

GLSVLSI '25

Sponsor:
sigda

Great Lakes Symposium on VLSI 2025

June 30 - July 2, 2025

New Orleans , LA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
252
Total Downloads

Downloads (Last 12 months)23
Downloads (Last 6 weeks)2

Reflects downloads up to 30 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bazzaz MHoseinghorban AEjlali A(2021)Fast and Predictable Non-Volatile Data Memory for Real-Time Embedded SystemsIEEE Transactions on Computers10.1109/TC.2020.298826170:3(359-371)Online publication date: 1-Mar-2021
https://dl.acm.org/doi/10.1109/TC.2020.2988261
Hoseinghorban ABazzaz MEjlali A(2018)Fast write operations in non-volatile memories using latency masking2018 Real-Time and Embedded Systems and Technologies (RTEST)10.1109/RTEST.2018.8397072(1-7)Online publication date: May-2018
https://doi.org/10.1109/RTEST.2018.8397072

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten