ABSTRACT
Consistent with the trend towards the use of many cores in SOC and 3D Chip techniques, this paper proposes a "single-cycle ring" interconnection (SC_Ring) with ultra-low latency and minimal complexity. The proposed SC_Ring allows multiple single-cycle transactions in parallel. The main features of the circuit-switched design include a set of 3-ported circuit-switched routers (4~16) and a performance/timing effective arbiter. The arbiter, called "BTPC", features single-cycle arbitration and routing-control by means of the novel Binary-Tree paths convergence and path-prediction mechanisms, to provide a highly reduced time complexity. By combining this with the integration of 3D chips, the proposed ring-based interconnection offers several advantages for hierarchical clustering in future many-core systems, in terms of cost, latency, and power reductions. Moreover, based on the proposed SC_Ring, this work realizes a "level-1 non-uniform cache architecture" (L1-NUCA) for fast data communication without cache-coherency in facilitating multithreading/multi-core as a case study. Finally, experimental results show that our approach yields promising performance.
- Kim, J., Park, D., Theocharides, T., Vijaykrishnan, N., and Das, C. R. A low latency router supporting adaptivity for on-chip interconnects. In Proceedings of Annual Conference on Design Automation. 2005. Google ScholarDigital Library
- Bourduas, S. and Zilic, Z. A Hybrid Ring/Mesh Interconnect for Network-on-Chip Using Hierarchical Rings for Global Routing. In Procs of the 1st international Symposium on Networks-on-Chip. 2007. Google ScholarDigital Library
- Chang, K., Shen, J., and Chen, T. Tailoring circuit-switched network-on-chip to application-specific system-on-chip by two optimization schemes. ACM Trans. Des. Autom. Electron. Syst.vol. 13, no. 1. 2008. Google ScholarDigital Library
- Kistler, M., Perrone, M., and Petrini, F. Cell Multiprocessor Communication Network: Built for Speed. Micro vol. 26, no. 3. 2006. Google ScholarDigital Library
- Loh, G. H., Xie, Y., and Black, B. Processor Design in 3D Die-Stacking Technologies. IEEE Micro vol. 27, no. 3. 2007. Google ScholarDigital Library
- Li, F., Nicopoulos, C., Richardson, T., Xie, Y., Narayanan, V., and Kandemir, M. Design and Management of 3D Chip Multiprocessors Using Network-in-Memory. In Procs of the ISCA. 2006. Google ScholarDigital Library
- Pavlidis, V. F. and Friedman, E. G. 3-D topologies for net-works-on-chip. IEEE Trans. Very Large Scale Integr. Syst. vol. 15, no. 10. 2007. Google ScholarDigital Library
- Huh, J., Kim, C., Shafi, H., Zhang, L., Burger, D., and Keck-ler, S. W. A NUCA substrate for flexible CMP cache sharing. In Procs. of Intl. Conference on Supercomputing. 2005. Google ScholarDigital Library
- Dybdahl, H. and Stenstrom, P. An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors. In Proceedings of the Intl. Symp. on High Performance Computer Architecture. 2007. Google ScholarDigital Library
- CACTI: An Integrated Cache Timing, Power, and Area Model http://www.ece.ubc.ca/~stevew/cacti/Google Scholar
- Nguyen, A.-T.; Michael, M.; Sharma, A.; Torrellas, J. The Augmint multiprocessor simulation toolkit for Intel x86 architectures. In procs of Computer Design: VLSI in Computers and Processors, 1996. Google ScholarDigital Library
Index Terms
- No cache-coherence: a single-cycle ring interconnection for multi-core L1-NUCA sharing on 3D chips
Recommendations
A hybrid NoC design for cache coherence optimization for chip multiprocessors
DAC '12: Proceedings of the 49th Annual Design Automation ConferenceOn chip many-core systems, evolving from prior multi-processor systems, are considered as a promising solution to the performance scalability and power consumption problems. The long communication distance between the traditional multi-processors makes ...
Evaluation and design trade-offs between circuit-switched and packet-switched NOCs for application-specific SOCs
DAC '06: Proceedings of the 43rd annual Design Automation ConferenceNOC architectures have to deliver good latency-throughput performance in the face of very tight power and area budgets. However, the latency and the power consumption for transferring information down the transmitter stack, through the channel, and up ...
High-speed dynamic TDMA arbiter for inter-layer communications in 3-D network-on-chip
The conventional two-dimensional 2-D integrated circuit IC has limited scope for floor planning and therefore limits the performance improvements resulting from the Network-on-Chip NoC paradigm. The arrangement of 3-D also offers opportunities for new ...
Comments