research-article

No cache-coherence: a single-cycle ring interconnection for multi-core L1-NUCA sharing on 3D chips

Authors:
Shu-Hsuan Chou

National Chung Cheng University, Taiwan, R.O.C.

National Chung Cheng University, Taiwan, R.O.C.
View Profile

,
Chien-Chih Chen

National Chung Cheng University, Taiwan, R.O.C.

National Chung Cheng University, Taiwan, R.O.C.
View Profile

,
Chi-Neng Wen

National Chung Cheng University, Taiwan, R.O.C.

National Chung Cheng University, Taiwan, R.O.C.
View Profile

,
Yi-Chao Chan

National Chung Cheng University, Taiwan, R.O.C.

National Chung Cheng University, Taiwan, R.O.C.
View Profile

,
Tien-Fu Chen

National Chung Cheng University, Taiwan, R.O.C.

National Chung Cheng University, Taiwan, R.O.C.
View Profile

,
Chao-Ching Wang

National Chung Cheng University, Taiwan, R.O.C.

National Chung Cheng University, Taiwan, R.O.C.
View Profile

,
Jinn-Shyan Wang

National Chung Cheng University, Taiwan, R.O.C.

National Chung Cheng University, Taiwan, R.O.C.
View Profile

DAC '09: Proceedings of the 46th Annual Design Automation ConferenceJuly 2009Pages 587–592https://doi.org/10.1145/1629911.1630062

Published:26 July 2009Publication History

DAC '09: Proceedings of the 46th Annual Design Automation Conference

Pages 587–592

ABSTRACT

Consistent with the trend towards the use of many cores in SOC and 3D Chip techniques, this paper proposes a "single-cycle ring" interconnection (SC_Ring) with ultra-low latency and minimal complexity. The proposed SC_Ring allows multiple single-cycle transactions in parallel. The main features of the circuit-switched design include a set of 3-ported circuit-switched routers (4~16) and a performance/timing effective arbiter. The arbiter, called "BTPC", features single-cycle arbitration and routing-control by means of the novel Binary-Tree paths convergence and path-prediction mechanisms, to provide a highly reduced time complexity. By combining this with the integration of 3D chips, the proposed ring-based interconnection offers several advantages for hierarchical clustering in future many-core systems, in terms of cost, latency, and power reductions. Moreover, based on the proposed SC_Ring, this work realizes a "level-1 non-uniform cache architecture" (L1-NUCA) for fast data communication without cache-coherency in facilitating multithreading/multi-core as a case study. Finally, experimental results show that our approach yields promising performance.

References

Kim, J., Park, D., Theocharides, T., Vijaykrishnan, N., and Das, C. R. A low latency router supporting adaptivity for on-chip interconnects. In Proceedings of Annual Conference on Design Automation. 2005. Google ScholarDigital Library
Bourduas, S. and Zilic, Z. A Hybrid Ring/Mesh Interconnect for Network-on-Chip Using Hierarchical Rings for Global Routing. In Procs of the 1st international Symposium on Networks-on-Chip. 2007. Google ScholarDigital Library
Chang, K., Shen, J., and Chen, T. Tailoring circuit-switched network-on-chip to application-specific system-on-chip by two optimization schemes. ACM Trans. Des. Autom. Electron. Syst.vol. 13, no. 1. 2008. Google ScholarDigital Library
Kistler, M., Perrone, M., and Petrini, F. Cell Multiprocessor Communication Network: Built for Speed. Micro vol. 26, no. 3. 2006. Google ScholarDigital Library
Loh, G. H., Xie, Y., and Black, B. Processor Design in 3D Die-Stacking Technologies. IEEE Micro vol. 27, no. 3. 2007. Google ScholarDigital Library
Li, F., Nicopoulos, C., Richardson, T., Xie, Y., Narayanan, V., and Kandemir, M. Design and Management of 3D Chip Multiprocessors Using Network-in-Memory. In Procs of the ISCA. 2006. Google ScholarDigital Library
Pavlidis, V. F. and Friedman, E. G. 3-D topologies for net-works-on-chip. IEEE Trans. Very Large Scale Integr. Syst. vol. 15, no. 10. 2007. Google ScholarDigital Library
Huh, J., Kim, C., Shafi, H., Zhang, L., Burger, D., and Keck-ler, S. W. A NUCA substrate for flexible CMP cache sharing. In Procs. of Intl. Conference on Supercomputing. 2005. Google ScholarDigital Library
Dybdahl, H. and Stenstrom, P. An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors. In Proceedings of the Intl. Symp. on High Performance Computer Architecture. 2007. Google ScholarDigital Library
CACTI: An Integrated Cache Timing, Power, and Area Model http://www.ece.ubc.ca/~stevew/cacti/Google Scholar
Nguyen, A.-T.; Michael, M.; Sharma, A.; Torrellas, J. The Augmint multiprocessor simulation toolkit for Intel x86 architectures. In procs of Computer Design: VLSI in Computers and Processors, 1996. Google ScholarDigital Library

Index Terms

No cache-coherence: a single-cycle ring interconnection for multi-core L1-NUCA sharing on 3D chips
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multiple instruction, multiple data
2. Hardware
  1. Communication hardware, interfaces and storage
    1. Buses and high-speed links

Recommendations

A hybrid NoC design for cache coherence optimization for chip multiprocessors
DAC '12: Proceedings of the 49th Annual Design Automation Conference

On chip many-core systems, evolving from prior multi-processor systems, are considered as a promising solution to the performance scalability and power consumption problems. The long communication distance between the traditional multi-processors makes ...
Read More
Evaluation and design trade-offs between circuit-switched and packet-switched NOCs for application-specific SOCs
DAC '06: Proceedings of the 43rd annual Design Automation Conference

NOC architectures have to deliver good latency-throughput performance in the face of very tight power and area budgets. However, the latency and the power consumption for transferring information down the transmitter stack, through the channel, and up ...
Read More
High-speed dynamic TDMA arbiter for inter-layer communications in 3-D network-on-chip

The conventional two-dimensional 2-D integrated circuit IC has limited scope for floor planning and therefore limits the performance improvements resulting from the Network-on-Chip NoC paradigm. The arrangement of 3-D also offers opportunities for new ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

DAC '09: Proceedings of the 46th Annual Design Automation Conference
July 2009
994 pages
ISBN:9781605584973
DOI:10.1145/1629911

Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 July 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
NOC
SOC
arbitration
level-1 non-uniform cache architecture
memory structure
multi-core
ring interconnection
single-cycle transactions
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,770of5,499submissions,32%
Upcoming Conference
DAC '24

Sponsor:

sigda

61st ACM/IEEE Design Automation Conference

June 23 - 27, 2024

San Francisco , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 297
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

No cache-coherence: a single-cycle ring interconnection for multi-core L1-NUCA sharing on 3D chips

DAC '09: Proceedings of the 46th Annual Design Automation Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

A hybrid NoC design for cache coherence optimization for chip multiprocessors

Evaluation and design trade-offs between circuit-switched and packet-switched NOCs for application-specific SOCs

High-speed dynamic TDMA arbiter for inter-layer communications in 3-D network-on-chip

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

No cache-coherence: a single-cycle ring interconnection for multi-core L1-NUCA sharing on 3D chips

DAC '09: Proceedings of the 46th Annual Design Automation Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

A hybrid NoC design for cache coherence optimization for chip multiprocessors

Evaluation and design trade-offs between circuit-switched and packet-switched NOCs for application-specific SOCs

High-speed dynamic TDMA arbiter for inter-layer communications in 3-D network-on-chip

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media