skip to main content
research-article

Post-silicon platform for the functional diagnosis and debug of networks-on-chip

Published:28 March 2014Publication History
Skip Abstract Section

Abstract

The increasing number of units in today's systems-on-chip and multicore processors has led to complex intra-chip communication solutions. Specifically, Networks-on-Chip (NoCs) have emerged as a favorable fabric to provide high bandwidth and low latency in connecting many units in a same chip. To achieve these goals, the NoC often includes complex components and advanced features, leading to the development of large and highly complex interconnect subsystems. One of the biggest challenges in these designs is to ensure the correct functionality of this communication infrastructure. To support this goal, an increasing fraction of the validation effort has shifted to post-silicon validation, because it permits exercising network activities that are too complex to be validated in pre-silicon. However, post-silicon validation is hindered by the lack of observability of the network's internal operations and thus, diagnosing functional errors during this phase is very difficult.

In this work, we propose a post-silicon validation platform that improves observability of network operations by taking periodic snapshots of the traffic traversing the network. Each node's local cache is configured to temporarily store the snapshot logs in a designated area reserved for post-silicon validation and relinquished after product release. Each snapshot log is analyzed locally by a software algorithm running on its corresponding core, in order to detect functional errors. Upon error detection, all snapshot logs are aggregated at a central location to extract additional debug data, including an overview of network traffic surrounding the error event, as well as a partial reconstruction of the routes followed by packets in flight at the time. In our experiments, we found that this approach allows us to detect several types of functional errors, as well as observe, on average, over 50% of the network's traffic and reconstruct at least half of each of their routes through the network.

References

  1. M. Abramovici. 2008. In-system silicon validation and debug. IEEE Des. Test Comput. 25, 3, 216--223. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Abramovici, P. Bradley, K. Dwarakanath, P. Levin, G. Memmi, and D. Miller. 2006. A reconfigurable design-for-debug infrastructure for socs. In Proceedings of the 43rd Annual Design Automation Conference (DAC'06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Al Faruque, G. Weiss, and J. Henkel. 2006. Bounded arbitration algorithm for qos-supported onchip communication. In Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES/ISSS'06). 76--81. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Bienia, S. Kumar, J. P. Singh, and K. Li. 2008. The parsec benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT'08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Chatterjee, C. McCarter, and V. Bertacco. 2011. Simulation-based signal selection for state restoration in silicon debug. In Proceedings of the International Conference on Computer-Aided Design (ICCAD'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Ciordas, T. Basten, A. Radulescu, K. Goossens, and J. Meerbergen. 2004. An event-based network-on-chip monitoring service. In Proceedings of the High Level Design Validation and Test Workshop (HLDVT'04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Ciordas, K. Goossens, T. Basten, A. Radulescu, and A. Boon. 2006. Transaction monitoring in networks on chip: The on-chip run-time perspective. In Proceedings of the International Symposium on Industrial Embedded Systems (IES'06).Google ScholarGoogle Scholar
  8. W. Dally and B. Towles. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Das, S. Eachempati, A. Mishra, V. Narayanan, and C. Das. 2009. Design and evaluation of a hierarchical on-chip interconnect for next-generation cmps. In Proceedings of the 15th IEEE International Symposium on High Performance Computer Architecture (HPCA'09). 175--186.Google ScholarGoogle Scholar
  10. A. Deorio, A. Bauserman, and V. Bertacco. 2008. Post-silicon verification for cache coherence. In Proceedings of the International Conference on Computer Design (ICCD'08).Google ScholarGoogle Scholar
  11. A. Deorio, I. Wagner, and V. Bertacco. 2009. Dacota: Post-silicon validation of the memory subsystem in multi-core designs. In Proceedings of the International Symposium on High Performance Computing Architecture (HPCA'09).Google ScholarGoogle Scholar
  12. IEEE STD.1149.1. 1990. IEEE standard test access s port and boundary scan architecture. IEEE Std. 1149.1-1990.Google ScholarGoogle Scholar
  13. S. M. A. H. Jafri, L. Guang, A. Jantsch, K. Paul, A. Hemani, and H. Tenhunen. 2012. Self-adaptive noc power management with dual-level agents - Architecture and implementation. In Proceedings of the 2nd International Conference on Pervasive and Embedded Computing and Communication Systems (PECCS'12). 450--458. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. Kim, J. Kim, and S. Yoo. 2011. Flexibuffer: Reducing leakage power in on-chip network routers. In Proceedings of the 48th Design Automation Conference (DAC'11). ACM Press, New York, 936--941. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. F. Ko and N. Nicolici. 2008. Automated trace signals identification and state restoration for improving observability in post-silicon validation. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. F. Ko and N. Nicolici. 2010. Automated trace signals selection using the rtl descriptions. In Proceedings of the International Test Conference (ITC'10).Google ScholarGoogle Scholar
  17. T. Krishna, L.-S. Peh, B. M. Beckmann, and S. K. Reinhardt. 2011. Towards the ideal on-chip fabric for 1-to-many and many-to-1 communication. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'11). ACM Press, New York, 71--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C.-H. Lai, F.-C. Yang, C.-F. Kao, and I.-J. Huang. 2009. A trace-capable instruction cache for cost efficient real-time program trace compression in soc. In Proceedings of the 46th Annual Design Automation Conference (DAC'09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. L. Lamport. 1978. Time, clocks, and the ordering of events in a distributed system. Comm. ACM 21, 7, 558--565. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Z. Li, C. Zhu, L. Shang, R. Dick, and Y. Sun. 2008. Transaction-aware network-on-chip resource reservation. IEEE Comput. Archit. Lett. 7, 2, 53--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. X. Liu and Q. Xu. 2009. Trace signal selection for visibility enhancement in post-silicon validation. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Z. Lv, H. Chen, F. Chen, and Y. Lv. 2011. Fast verification of memory consistency for chip multiprocessor. In Proceedings of the 7th International Conference on Computational Intelligence and Security (CIS'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. K. Mishra, S. Srikantaiah, M. Kandemir, and C. R. Das. 2010. Cpm in cmps: Coordinated power management in chip-multiprocessors. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC'10). IEEE Computer Society, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. P. R. Panda, M. Balakrishnan, and A. Vishnoi. 2011. Compressing cache state for postsilicon processor debug. IEEE Trans. Comput. 60, 4, 484--497. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. P. R. Panda, A. Vishnoi, and M. Balakrishnan. 2010. Enhancing post-silicon processor debug with incremental cache state dumping. In Proceedings of the 18th IEEE/IFIP VLSI System on Chip Conference (VLSI-SoC'10). 55--60.Google ScholarGoogle Scholar
  26. R. Parikh and V. Bertacco. 2011. Formally enhanced runtime verification to ensure noc functional correctness. In Proceedings of the International Symposium on Microarchitecture (MICRO'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S.-B. Park, A. Bracy, H. Wang, and S. Mitra. 2010. Blog: Post-silicon bug localization in processors using bug localization graphs. In Proceedings of the 47th Design Automation Conference (DAC'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S.-B. Park, T. Hong, and S. Mitra. 2009. Post-silicon bug localization in processors using instruction footprint recording and analysis (ifra). Trans. Comput.-Aided Des. Integr. Circ. Syst. 28, 10, 1545--1558. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. H. Rotithor. 2000. Post-silicon validation methodology for microprocessors. IEEE Des. Test 17, 4, 77--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Stuijk, T. Basten, M. Geilen, A. Ghamarian, and B. Theelen. 2006. Resource-efficient routing and scheduling of time-constrained network-on-chip communication. In Proceedings of the 9th EUROMICRO Conference on Digital System Design: Architectures, Methods and Tools (DSD'06). 45--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. Tang and Q. Xu. 2007. A multi-core debug platform for noc-based systems. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Van den Brand. 2005. Runtime networks-on-chip performance monitoring. M.S. thesis, Technische Universiteit Eindhoven.Google ScholarGoogle Scholar
  33. B. Vermeulen and K. Goossens. 2009. A network-on-chip monitoring infrastructure for communication-centric debug of embedded multi-processor socs. In Proceedings of the International Symposium on VLSI Design, Automation and Test (VLSI/DAT'09).Google ScholarGoogle Scholar
  34. B. Vermeulen, S. Oostdijk, and F. Bouwman. 2001. Test and debug strategy of the pnx8525 nexperiatm digital video platform system chip. In Proceedings of the IEEE International Test Conference (ITC'01). Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. A. Vishnoi, P. Panda, and M. Balakrishnan. 2009. Cache aware compression for processor debug support. In Design, Automation Test in Europe Conference Exhibition (DATE'09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. I. Wagner and V. Bertacco. 2008. Reversi: Post-silicon validation system for modern microprocessors. In Proceedings of the International Conference on Computer Design (ICCD'08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. J.-S. Yang and N. A. Touba. 2009. Automated selection of signals to observe for efficient silicon debug. In Proceedings of VLSI Test Symposium (VTS'09). 79--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. H. Yi, S. Park, and S. Kundu. 2008. A design-for-debug (dfd) for noc-based soc debugging via noc. In Proceedings of the Asian Test Symposium (ATS'08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. H. Yi, S. Park, and S. Kundu. 2010. On-chip support for NoC-based SoC debugging. IEEE Trans. Circ. Syst. 57, 7. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Post-silicon platform for the functional diagnosis and debug of networks-on-chip

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Embedded Computing Systems
      ACM Transactions on Embedded Computing Systems  Volume 13, Issue 3s
      Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
      March 2014
      403 pages
      ISSN:1539-9087
      EISSN:1558-3465
      DOI:10.1145/2597868
      Issue’s Table of Contents

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 28 March 2014
      • Accepted: 1 October 2013
      • Revised: 1 August 2013
      • Received: 1 December 2012
      Published in tecs Volume 13, Issue 3s

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader