skip to main content
10.1145/1629911.1630062acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

No cache-coherence: a single-cycle ring interconnection for multi-core L1-NUCA sharing on 3D chips

Published:26 July 2009Publication History

ABSTRACT

Consistent with the trend towards the use of many cores in SOC and 3D Chip techniques, this paper proposes a "single-cycle ring" interconnection (SC_Ring) with ultra-low latency and minimal complexity. The proposed SC_Ring allows multiple single-cycle transactions in parallel. The main features of the circuit-switched design include a set of 3-ported circuit-switched routers (4~16) and a performance/timing effective arbiter. The arbiter, called "BTPC", features single-cycle arbitration and routing-control by means of the novel Binary-Tree paths convergence and path-prediction mechanisms, to provide a highly reduced time complexity. By combining this with the integration of 3D chips, the proposed ring-based interconnection offers several advantages for hierarchical clustering in future many-core systems, in terms of cost, latency, and power reductions. Moreover, based on the proposed SC_Ring, this work realizes a "level-1 non-uniform cache architecture" (L1-NUCA) for fast data communication without cache-coherency in facilitating multithreading/multi-core as a case study. Finally, experimental results show that our approach yields promising performance.

References

  1. Kim, J., Park, D., Theocharides, T., Vijaykrishnan, N., and Das, C. R. A low latency router supporting adaptivity for on-chip interconnects. In Proceedings of Annual Conference on Design Automation. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bourduas, S. and Zilic, Z. A Hybrid Ring/Mesh Interconnect for Network-on-Chip Using Hierarchical Rings for Global Routing. In Procs of the 1st international Symposium on Networks-on-Chip. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Chang, K., Shen, J., and Chen, T. Tailoring circuit-switched network-on-chip to application-specific system-on-chip by two optimization schemes. ACM Trans. Des. Autom. Electron. Syst.vol. 13, no. 1. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Kistler, M., Perrone, M., and Petrini, F. Cell Multiprocessor Communication Network: Built for Speed. Micro vol. 26, no. 3. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Loh, G. H., Xie, Y., and Black, B. Processor Design in 3D Die-Stacking Technologies. IEEE Micro vol. 27, no. 3. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Li, F., Nicopoulos, C., Richardson, T., Xie, Y., Narayanan, V., and Kandemir, M. Design and Management of 3D Chip Multiprocessors Using Network-in-Memory. In Procs of the ISCA. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Pavlidis, V. F. and Friedman, E. G. 3-D topologies for net-works-on-chip. IEEE Trans. Very Large Scale Integr. Syst. vol. 15, no. 10. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Huh, J., Kim, C., Shafi, H., Zhang, L., Burger, D., and Keck-ler, S. W. A NUCA substrate for flexible CMP cache sharing. In Procs. of Intl. Conference on Supercomputing. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Dybdahl, H. and Stenstrom, P. An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors. In Proceedings of the Intl. Symp. on High Performance Computer Architecture. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. CACTI: An Integrated Cache Timing, Power, and Area Model http://www.ece.ubc.ca/~stevew/cacti/Google ScholarGoogle Scholar
  11. Nguyen, A.-T.; Michael, M.; Sharma, A.; Torrellas, J. The Augmint multiprocessor simulation toolkit for Intel x86 architectures. In procs of Computer Design: VLSI in Computers and Processors, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. No cache-coherence: a single-cycle ring interconnection for multi-core L1-NUCA sharing on 3D chips

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        DAC '09: Proceedings of the 46th Annual Design Automation Conference
        July 2009
        994 pages
        ISBN:9781605584973
        DOI:10.1145/1629911

        Copyright © 2009 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 26 July 2009

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate1,770of5,499submissions,32%

        Upcoming Conference

        DAC '24
        61st ACM/IEEE Design Automation Conference
        June 23 - 27, 2024
        San Francisco , CA , USA

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader