skip to main content
research-article

Heuristic search for adaptive, defect-tolerant multiprocessor arrays

Published: 21 March 2013 Publication History

Abstract

In this article, new heuristic-search methods and algorithms are presented for enabling highly efficient and adaptive, defect-tolerant multiprocessor arrays. We consider systems where a homogeneous multiprocessor array lies on top of reconfigurable interconnects which allow the pipeline stages of the processors to be connected in all possible configurations. Considering the multiprocessor array partitioned in substitutable units at the granularity of pipeline stages, we employ a variety of heuristic-search methods and algorithms to isolate and replace defective units. The proposed heuristics are designed for off-line execution and aim at minimizing the performance overhead necessarily introduced to the array by the interconnects' latency. An empirical evaluation of the designed algorithms is then carried out, in order to assess the targeted problem and the efficacy of our approach. Our findings indicate this to be a NP-complete computational problem, however, our heuristic-search methods can achieve, for the problem sizes we exhaustively searched, 100% accuracy in finding the optimal solution among 1019 possible candidates within 2.5 seconds. Alternatively, they can provide near-optimal solutions at an accuracy which consistently exceeds 70% (compared to the optimal solution) in only 10-4 seconds.

References

[1]
Aggarwal, N., Ranganathan, P., Jouppi, N. P., and Smith, J. E. 2007. Configurable isolation: building high availability systems with commodity multi-core processors. In Proceedings of ISCA'07, 470--481.
[2]
Austin, T., Bertacco, V., Mahlke, S., and Cao, Y. 2008. Reliable systems on unreliable fabrics. IEEE Des. Test 25, 4, 322--332.
[3]
Borkar, S. 2005. Designing reliable systems from unreliable components: the challenges of transistor variability and degradation. IEEE, Micro 25, 6, 10--16.
[4]
Cormen, T., Leiserson, C., Rivest, R., and Stein, C. 2003. Introduction to Algorithms 2nd Ed. McGraw-Hill Science/Engineering/Math.
[5]
D. Bhaduri, S. K. Shukla, P. G. M. G. 2005. Reliability analysis of fault-tolerant reconfigurable architectures. In Proceedings of the IEEE International Workshop on Design & Test of Defect-Tolerant Nanoscale Archit (NANOARCH).
[6]
Garey, M. R. and Johnson, D. S. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman and Co., San Francisco, CA.
[7]
Gold, B. T., Falsafi, B., and Hoe, J. C. 2009. Chip-level redundancy in distributed shared-memory multiprocessors. In Proceedings of the 15th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC'09). 195--201.
[8]
Goldberg, D. E. 1989. Genetic Algorithms in Search, Optimization and Machine Learning 1st Ed. Addison-Wesley Longman Publishing Co., Inc., Boston, MA.
[9]
Gupta, S., Ansari, A., Feng, S., and Mahlke, S. 2010. StageWeb: Interweaving pipeline stages into a wearout and variation tolerant CMP fabric. Proceedings of the International Conference on Dependable Systems and Networks, 101--110.
[10]
Gupta, S., Feng, S., Ansari, A., Blome, J., and Mahlke, S. 2008. The stagenet fabric for constructing resilient multicore systems. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO-41). 141--151.
[11]
Gupta, S., Feng, S., Ansari, A., and Mahlke, S. 2011. Stagenet: A reconfigurable fabric for constructing dependable cmps. IEEE Trans. Comput. 60, 1, 5--19.
[12]
Gupta, S., Feng, S., Blome, J., and Mahlke, S. 2007. Stagenet: A reconfigurable cmp fabric for resilient systems. In Proceedings of the Reconfigurable and Adaptive Architecture Workshop (RAAW).
[13]
Hauck, S. and DeHon, A. 2007. Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation. Morgan Kaufmann Publishers Inc., San Francisco, CA.
[14]
Hennessy, J. L. and Patterson, D. A. 2007. Computer Architecture—A Quantitative Approach 4th Ed.
[15]
Hoos, H. H. and Stuetzle, T. 2004. Stochastic Local Search: Foundations and Applications.
[16]
J. Emmert, C. Stroud, B. S., and Abramovici, M. 2000. Dynamic fault tolerance in fpgas via partial reconfiguration. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines.
[17]
J. M. Emmert, C. E. S. and Abramovici, M. 2007. Online fault tolerance for fpga logic blocks. IEEE Trans. VLSI Syst. 15, 216--226.
[18]
Kirkpatrick, S., Gelatt, C. D., and Vecchi, M. P. 1983. Optimization by simulated annealing. Science 220, 4598, 671--680.
[19]
Knuth, D. E. 2011. The Art of Computer Programming, Volume 4A: Combinatorial Algorithms, Part 1. Addison-Wesley Professional.
[20]
Powell, M. D., Biswas, A., Gupta, S., and Mukherjee, S. S. 2009. Architectural core salvaging in a multi-core processor for hard-error tolerance. In Proceedings of the International Symposium on Computer Architecture (ISCA'09). 93--104.
[21]
Ramirez, A., Cabarcas, F., Juurlink, B., Mesa, M. A., Sanchez, F., Azevedo, A., Meenderinck, C., Ciobanu, C., Isaza, S., and Gaydadjiev, G. 2010. The sarc architecture. IEEE Micro 30, 16--29.
[22]
Ramos, J., Samson, J., Lupia, D., Troxel, I., Subramaniyan, R., A. Jacobs, J. Greco, G. C., Curreri, J., Fischer, M., Grobelny, E., Georgr, A., and Some, R. 2006. High-performance, dependable multiprocessor. In Proceedings of the IEEE Aerospace Conference.
[23]
Riordan, T., Grewal, G., Hsu, S., Kinsel, J., Libby, J., March, R., Mills, M., Ries, P., and Scofield, R. 1988. The mips m2000 system. In Proceedings of the ICCD. 366--369.
[24]
Romanescu, B. F. and Sorin, D. J. 2008. Core cannibalization architecture: improving lifetime chip performance for multicore processors in the presence of hard faults. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT'08). 43--51.
[25]
Rosen, K. H. 2002. Discrete Mathematics and Its Applications 5th Ed. McGraw-Hill Higher Education.
[26]
Shuler, Robert, J. 2010. Fpgas with reconfigurable fault-tolerant redundancy. NASA Tech briefs MSC-24464-1, NASA Center, Johnson Space Center.
[27]
Smaragdos, G. 2012. An adaptive defect-tolerant multiprocessor array architecture. M.S. thesis, Delft University of Technology.
[28]
Smolens, J. C., Gold, B. T., Falsafi, B., and Hoe, J. C. 2006. Reunion: Complexity-effective multicore redundancy. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO'39). 223--234.
[29]
Stuart, R. and Peter, N. 2003. Artificial Intelligence: A Modern Approach 2nd Ed. Prentice Hall.
[30]
Sylvester, D., Blaauw, D., and Karl, E. 2006. Elastic: An adaptive self-healing architecture for unpredictable silicon. Proceedings of the Desi. Test Comput. 23, 6, 484--490.
[31]
Tzilis, S., Sourdis, I., and Gaydadjiev, G. 2010. Fine-grain fault diagnosis for fpga logic blocks. In Proceedings of the International Conference on Field-Programmable Technology (FPT 2010). Beijing, China.
[32]
Vasilikos, V. 2011. Heuristic search for defect tolerant adaptive multiprocessor arrays. M.S. thesis.
[33]
Černý, V. 1985. Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm. J. Optimi. Theory Appli. 45, 1, 41--51.

Cited By

View all
  • (2016)Runtime management of adaptive MPSoCs for graceful degradationProceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems10.1145/2968455.2968517(1-10)Online publication date: 1-Oct-2016
  • (2016)Resilient Chip Multiprocessors with Mixed-Grained ReconfigurabilityIEEE Micro10.1109/MM.2015.736:1(35-45)Online publication date: 1-Jan-2016
  • (2014)A Dependable Coarse-Grain Reconfigurable Multicore ArrayProceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops10.1109/IPDPSW.2014.20(141-150)Online publication date: 19-May-2014
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 12, Issue 1s
Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
March 2013
701 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/2435227
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 21 March 2013
Accepted: 01 September 2012
Revised: 01 November 2011
Received: 01 July 2011
Published in TECS Volume 12, Issue 1s

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Heuristic methods
  2. adaptable architectures
  3. interconnection architectures
  4. parallel processors
  5. pipeline processors

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2016)Runtime management of adaptive MPSoCs for graceful degradationProceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems10.1145/2968455.2968517(1-10)Online publication date: 1-Oct-2016
  • (2016)Resilient Chip Multiprocessors with Mixed-Grained ReconfigurabilityIEEE Micro10.1109/MM.2015.736:1(35-45)Online publication date: 1-Jan-2016
  • (2014)A Dependable Coarse-Grain Reconfigurable Multicore ArrayProceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops10.1109/IPDPSW.2014.20(141-150)Online publication date: 19-May-2014
  • (2014)A runtime manager for gracefully degrading SoCs2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)10.1109/DFT.2014.6962106(216-221)Online publication date: Oct-2014

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media