skip to main content
10.1145/2968456.2968471acmotherconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
research-article
Public Access

Scalable and realistic benchmark synthesis for efficient NoC performance evaluation: a complex network analysis approach

Published:01 October 2016Publication History

ABSTRACT

The complexity of the design-space exploration of large-scale NoCs is exacerbated not only by the ever-increasing number of cores, but also by the increased runtime uncertainties in both the scale and task structure of the emerging applications. Consequently, it is crucial to develop rigorous mathematical frameworks for capturing the task dependencies of varied applications to foster the generation of realistic benchmarks that can guide the NoC design. However, the current NoC benchmark suites either lack portability and poorly scale as they require intensive development efforts on specific architectures and simulation time, or are synthesized based on purely stochastic models that are disconnected with real applications, which may easily lead to biased and/or delayed design choices. To overcome these drawbacks, we propose a benchmark synthesis framework that i) not only allows extraction of dynamical task dependencies of the application and synthesize traffic workloads spatio-temporally consistent with realistic traffic behavior, ii) but can also be easily scaled by the proposed complex-network inspired algorithm for large benchmark generation while preserving key structural features that governs application communication behaviors. We validate the proposed framework on a large-scale simulation environment by running a set of real applications. Experimental results show that the synthesized benchmarks respect the traffic patterns of the original applications and preserve key features of application task structures.

References

  1. V. Advea and R. Sakellariou. Compiler synthesis of task graphs for parallel program performance prediction. In Languages and Compilers for Parallel Computing, pages 208--226. Springer, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. Agrawal, C. E. Leiserson, and J. Sukha. Executing task graphs using work-stealing. In Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on, pages 1--12. IEEE, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  3. A. R. Alameldeen and D. A. Wood. Variability in architectural simulations of multi-threaded workloads. In High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings. The Ninth International Symposium on, pages 7--18. IEEE, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. N. Barrow-Williams, C. Fensch, and S. Moore. A communication characterisation of splash-2 and parsec. In Workload Characterization, 2009. IISWC 2009. IEEE Int'l Symp. on, pages 86--97. IEEE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Bienia, S. Kumar, and K. Li. Parsec vs. splash-2: A quantitative comparison of two multithreaded benchmark suites on chip-multiprocessors. In Workload Characterization, 2008. IISWC 2008. IEEE International Symposium on, pages 47--56. IEEE, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  6. C. Bienia, S. Kumar, J. P. Singh, and K. Li. The parsec benchmark suite: Characterization and architectural implications. Technical Report TR-811-08, Princeton University, January 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Cong and B. Yuan. Energy-efficient scheduling on heterogeneous multi-core architectures. In Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design, pages 345--350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. W. J. Dally and B. P. Towles. Principles and practices of interconnection networks. Elsevier, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. P. Dick, D. L. Rhodes, and W. Wolf. Tgff: task graphs for free. In Proceedings of the 6th international workshop on Hardware/software codesign, pages 97--101. IEEE Computer Society, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. K. Ganeshpure and S. Kundu. On run time task graph extraction of soc. In SoC Design Conference (ISOCC), 2010 International, pages 380--383. IEEE, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  11. C. Grecu and et. al. Towards open network-on-chip benchmarks. In Networks-on-Chip, 2007. NOCS 2007. First Int'l Symp. on, pages 205--205. IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Hestness, B. Grot, and S. W. Keckler. Netrace: dependency-driven trace-based network-on-chip simulation. In Proc. of the Third Int'l Workshop on Network on Chip Architectures. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Kempf, K. Karuri, S. Wallentowitz, G. Ascheid, R. Leupers, and H. Meyr. A sw performance estimation framework for early system-level-design using fine-grained instrumentation. In Design, Automation and Test in Europe, 2006. DATE'06. Proceedings, volume 1, pages 6--pp. IEEE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. V. Kumar, A. Grama, A. Gupta, and G. Karypis. Introduction to parallel computing: design and analysis of algorithms. Addison Wesley, 2003.Google ScholarGoogle Scholar
  15. Y.-K. Kwok and I. Ahmad. Benchmarking and comparison of the task graph scheduling algorithms. Journal of Parallel and Distributed Computing, 59(3):381--422, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Lattner and V. Adve. Llvm: A compilation framework for lifelong program analysis & transformation. In Code Generation and Optimization, CGO 2004. Int'l Symp. on, pages 75--86. IEEE, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. W. Liu and et. al. A noc traffic suite based on real applications. In VLSI (ISVLSI), IEEE Computer Society Annual Symposium on. IEEE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Namballa, N. Ranganathan, and A. Ejnioui. Control and data flow graph extraction for high-level synthesis. In VLSI, 2004. Proc.. IEEE Computer society Annual Symp. on, pages 187--192. IEEE, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  19. E. Pekkarinen, L. Lehtonen, E. Salminen, and T. D. Hämäläinen. A set of traffic models for network-on-chip benchmarking. In System on Chip (SoC), 2011 Int'l Symp. on. IEEE, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  20. K. Pruhs, J. Sgall, and E. Torng. Online scheduling. pages 115--124. CRC Press, 2003.Google ScholarGoogle Scholar
  21. B. P. Railing, E. R. Hein, and T. M. Conte. Contech: Efficiently generating dynamic task graphs for arbitrary parallel programs. ACM Trans. on Architecture and Code Optimization (TACO), 12(2):25, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. Ron, I. Safro, and A. Brandt. Relaxation-based coarsening and multiscale graph organization. Multiscale Modeling & Simulation, 9(1):407--423, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  23. E. Salminen and et. al. Requirements for network-on-chip benchmarking. In NORCHIP Conference, 2005. 23rd. IEEE, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  24. E. Salminen, C. Grecu, T. D. Hämäläinen, and A. Ivanov. Network-on-chip benchmarks specifications part i: application modeling and hardware description.Google ScholarGoogle Scholar
  25. V. Soteriou, H. Wang, and L.-S. Peh. A statistical traffic model for on-chip interconnection networks. In Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, 2006. MASCOTS 2006. 14th IEEE Int'l Symp. on. IEEE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. K. S. Vallerio and N. K. Jha. Task graph extraction for embedded system synthesis. In VLSI Design, Proc. 16th Int'l Conf. on, pages 480--486. IEEE, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Z. Wang and et. al. A systematic network-on-chip traffic modeling and generation methodology. In Circuits and Systems (APCCAS), 2014 IEEE Asia Pacific Conference on, pages 675--678. IEEE, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  28. S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The splash-2 programs: Characterization and methodological considerations. In ACM SIGARCH computer architecture news, volume 23, pages 24--36. ACM, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    CODES '16: Proceedings of the Eleventh IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
    October 2016
    294 pages
    ISBN:9781450344838
    DOI:10.1145/2968456

    Copyright © 2016 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 1 October 2016

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate280of864submissions,32%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader