skip to main content
research-article
Open access

Efficient architectural design space exploration via predictive modeling

Published: 30 January 2008 Publication History

Abstract

Efficiently exploring exponential-size architectural design spaces with many interacting parameters remains an open problem: the sheer number of experiments required renders detailed simulation intractable. We attack this via an automated approach that builds accurate predictive models. We simulate sampled points, using results to teach our models the function describing relationships among design parameters. The models can be queried and are very fast, enabling efficient design tradeoff discovery. We validate our approach via two uniprocessor sensitivity studies, predicting IPC with only 1--2% error. In an experimental study using the approach, training on 1% of a 250-K-point CMP design space allows our models to predict performance with only 4--5% error. Our predictive modeling combines well with techniques that reduce the time taken by each simulation experiment, achieving net time savings of three-four orders of magnitude.

References

[1]
Bigus, J. 1994a. Applying neural networks to computer system performance tuning. In Proc. IEEE International Conference on Neural Networks.
[2]
Bigus, J. 1994b. Computer system performance modeling using neural networks. In Proc. International Neural Network Society Conference. 510--515.
[3]
Borkar, S., Dubey, P., Kahn, K., Kuck, D., Mulder, H., Pawlowski, S., and Rattner, J. 2006. Platform 2015: Intel processsor and platform evolution for the next decade. White Paper, Intel Corporation.
[4]
Cai, G., Chow, K., Nakanishi, T., Hall, J., and Barany, M. 1998. Multivariate power/performance analysis for high performance mobile microprocessor design. In Power Driven Microarchitecture Workshop.
[5]
Caruana, R., Lawrence, S., and Giles, C. 2000. Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping. In Proc. Neural Information Processing Systems Conference.
[6]
Chow, K. and Ding, J. 1997. Multivariate analysis of Pentium Pro processor. In Intel Software Developers Conference. 84--91.
[7]
Conte, T., Hirsch, M., and Menezes, K. 1996. Reducing state loss for effective trace sampling of superscalar processors. In Proc. IEEE International Conference on Computer Design. 468--477.
[8]
Davis, J., Laudon, J., and Olukotun, K. 2005. Maximizing CMP throughput with mediocre cores. In Proc. IEEE/ACM International Conference on Parallel Architectures and Compilation Techniques. 51--62.
[9]
Eeckhout, L., Nussbaum, S., Smith, J., and De Bosschere, K. 2003b. Statistical simulation: Adding efficiency to the computer designer's toolbox. IEEE Micro 23, 5, 26--38.
[10]
Eeckhout, L., Bell, Jr., R., Stougie, B., De Bosschere, K., and John, L. 2004. Control flow modeling in statistical simulation for accurate and efficient processor design studies. In Proc. 31st IEEE/ACM International Symposium on Computer Architecture. 350--361.
[11]
Eeckhout, L., Luo, Y., John, L., and De Bosschere, K. 2005. BLRL: Accurate and efficient warmup for sampled processor simulation. The Computer Journal 48, 4, 451--459.
[12]
Eeckhout, L., Vandierendonck, H., and De Bosschere, K. 2003a. Quantifying the impact of input data sets on program behavior and its applications. Journal of Instruction Level Parallelism 5, http://www.jilp.org/vol5.
[13]
Eyerman, S., Eeckhout, L., and Bosschere, K. D. 2006. Efficient design space exploration of high performance embedded out-of-order processors. In Proc. Design, Automation and Test in Europe. 351--356.
[14]
Fields, B., Bodick, R., Hill, M., and Newburn, C. 2004. Interaction cost and shotgun profiling. ACM Transactions on Architecture and Code Optimization 1, 3, 272--304.
[15]
Haskins, J. and Skadron, K. 2001. Minimal subset evaluation: Rapid warm-up for simulated hardware state. In Proc. IEEE International Conference on Computer Design. 32--39.
[16]
Haskins, J. and Skadron, K. 2003. Memory reference reuse latency: Accelerated sampled microarchitecture simulation. In Proc. IEEE International Symposium on Performance Analysis of Systems and Software. 195--203.
[17]
Hastie, T., Tibshirani, R., and Friedman, J. 2001. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Verlag, New York.
[18]
Ipek, E., de Supinski, B., Schulz, M., and McKee, S. 2005. An approach to performance prediction for parallel applications. In Proc. ACM/IEEE Euro-Par International European Conference on Parallel Computing. 196--205.
[19]
Ipek, E., McKee, S., de Supinski, B., Schulz, M., and Caruana, R. 2006. Efficiently exploring architectural design spaces via predictive modeling. In Proc. 12th ACM Symposium on Architectural Support for Programming Languages and Operating Systems. 195--206.
[20]
Iyengar, V., Trevillyan, L., and Bose, P. 1996. Representative traces for processor models with infinite cache. In Proc. 2nd IEEE Symposium on High Performance Computer Architecture. 62--73.
[21]
Jacob, B. 2003. A case for studying DRAM issues at the system level. IEEE Micro 23, 4, 44--56.
[22]
Joseph, P., Vaswani, K., and Thazhuthaveetil, M. 2006a. A predictive performance model for superscalar processrs. 161--170.
[23]
Joseph, P., Vaswani, K., and Thazhuthaveetil, M. 2006b. Use of linear regression models for processor performance analysis. In Proc. 12th IEEE Symposium on High Performance Computer Architecture. 99--108.
[24]
Karkhanis, T. and Smith, J. 2004. A 1st-order superscalar processor model. In Proc. 31st IEEE/ACM International Symposium on Computer Architecture. 338--349.
[25]
KleinOsowski, A. and Lilja, D. 2002. MinneSPEC: A new SPEC benchmark workload for simulation-based computer architecture research. Computer Architecture Letters 1.
[26]
Kongetira, P., Aingaran, K., and Olukotun, K. 2005. Niagara: A 32-way multithreaded SPARC processor. IEEE Micro 25, 2 (Mar.), 21--29.
[27]
Kumar, B. and Davidson, E. 1980. Computer system design using a hierarchical approach to performance evaluation. Communications of the ACM 23, 9 (Sept.), 511--521.
[28]
Kumar, R., Zyuban, V., and Tullsen, D. 2005. Interconnections in multi-core architectures: Understanding mechanisms, overheads and scaling. In Proc. 32nd IEEE/ACM International Symposium on Computer Architecture. 408--419.
[29]
Kunkel, S., Eickemeyer, R., Lipasti, M., Mullins, T., O'Krafka, B., Rosenberg, H., VanderWiel, S., Vitale, P., and Whitley, L. 2000. A performance methodology for commercial servers. IBM Journal of Research and Development 44, 6, 851--872.
[30]
Lee, B. and Brooks, D. 2006. Accurate and efficient regression modeling for microarchitectural performance and power prediction. In Proc. 12th ACM Symposium on Architectural Support for Programming Languages and Operating Systems. 185--194.
[31]
Li, Y., Lee, B., Brooks, D., Hu, Z., and Skadron, K. 2006. CMP design space exploration subject to physical constraints. In Proc. 12th IEEE Symposium on High Performance Computer Architecture. 15--26.
[32]
Martonosi, M. and Skadron, K. 2001. NSF computer performance evaluation workshop: Summary and action items. http://www.princeton.edu/~mrm/nsf_sim_final.pdf.
[33]
Marzban, C. 2000. A neural network for tornado diagnosis. Neural Computing and Applications 9, 2, 133--141.
[34]
Mitchell, T. 1997. Machine Learning. WCB/McGraw Hill, Boston, MA.
[35]
Muttreja, A., Raghunathan, A., Ravi, S., and Jha, N. 2004. Automated energy/performance macromodeling of embedded software. In Proc. 41st ACM/IEEE Design Automation Conference. 99--102.
[36]
Muttreja, A., Raghunathan, A., Ravi, S., and Jha, N. 2005. Hybrid simulation for embedded software energy estimation. In Proc. 42nd ACM/IEEE Design Automation Conference. 23--26.
[37]
Noonburg, D. and Shen, J. 1994. Theoretical modeling of superscalar processor performance. In Proc. IEEE/ACM 27th International Symposium on Microarchitecture. 53--62.
[38]
Oskin, M., Chong, F., and Farrens, M. 2000. HLS: Combining statistical and symbolic simulation to guide microprocessor design. In Proc. 27th IEEE/ACM International Symposium on Computer Architecture. 71--82.
[39]
Phansalkar, A., Joshi, A., Eeckhout, L., and John, L. 2005. Measuring program similarity: Experiments with SPEC CPU benchmark suites. In Proc. IEEE International Symposium on Performance Analysis of Systems and Software. 10--20.
[40]
Pomerleau, D. 1993. Knowledge-based training of artificial neural networks for autonomous robot driving. In Robot Learning, J. Connell and S. Mahadevan, Eds. Kluwer Academic Publ., Boston, MA. 19--43.
[41]
Ramanathan, R. 2006. Intel multi-core processors: Making the move to quad-core and beyond. White Paper, Intel Corporation.
[42]
Rapaka, V. and Marculescu, D. 2003. Pre-characterization free, efficient power/performance analysis of embedded and general purpose software applications. In Proc. ACM/IEEE Design, Automation and Test in Europe Conference and Exposition. 10504--10509.
[43]
Renau, J. 2002. SESC. http://sesc.sourceforge.net/index.html.
[44]
Saar-Tsechansky, M. and Provost, F. 2001. Active learning for class probability estimation and ranking. In Proc. 17th International Joint Conference on Artificial Intelligence. 911--920.
[45]
Sherwood, T., Perelman, E., Hamerly, G., and Calder, B. 2002. Automatically characterizing large scale program behavior. In Proc. 10th ACM Symposium on Architectural Support for Programming Languages and Operating Systems. 45--57.
[46]
Tesauro, G. 1995. Temporal difference learning and TD-Gammon. Communications of the ACM 38, 3 (Mar.), 58--68.
[47]
Van Biesbrouck, M., Eeckhout, L., and Calder, B. 2005. Efficient sampling startup for sampled processor simulation. In Proc. 1st International Conference on High Performance Embedded Architectures and Compilers. 47--67.
[48]
Wenisch, T., Wunderlich, R., Falsafi, B., and Hoe, J. 2005. TurboSMARTS: Accurate microarchitecture simulation sampling in minutes. SIGMETRICS Performance Evaluation Review 33, 1, 408--409.
[49]
Wilton, S. and Jouppi, N. 1996. CACTI: An enhanced cache access and cycle time model. IEEE Journal of Solid-State Circuits 31, 5 (May), 677--688.
[50]
Wunderlich, R., Wenish, T., Falsafi, B., and Hoe, J. 2003. SMARTS: Accelerating microarchitecture simulation via rigorous statistical sampling. In Proc. 30th IEEE/ACM International Symposium on Computer Architecture. 84--95.
[51]
Xie, Y., Loh, G., Black, B., and Bernstein, K. 2006. Design space exploration for 3D architectures. ACM Journal on Emerging Technologies in Computing Systems 2, 2, 65--103.
[52]
Yi, J., Lilja, D., and Hawkins, D. 2003. A statistically-rigorous approach for improving simulation methodology. In Proc. 9th IEEE Symposium on High Performance Computer Architecture. 281--291.
[53]
Yoo, R., Lee, H., Chow, K., and Lee, H. 2006. Constructing a non-linear model with neural networks for workload characterization. In Proc. IEEE International Symposium on Workload Characterization. 150--159.
[54]
Yu, S., Winslett, M., Lee, J., and Ma, X. 2002. Automatic and portable performance modeling for parallel I/O: A machine-learning approach. ACM SIGMETRICS Performance Evaluation Review 30, 3 (Dec.), 3--5.

Cited By

View all
  • (2022)Phronesis: Efficient Performance Modeling for High-dimensional Configuration TuningACM Transactions on Architecture and Code Optimization10.1145/354686819:4(1-26)Online publication date: 16-Sep-2022
  • (2022)Prediction Modeling for Application-Specific Communication Architecture Design of Optical NoCACM Transactions on Embedded Computing Systems10.1145/352024121:4(1-29)Online publication date: 23-Aug-2022
  • (2022)A Non-Parametric Histogram Interpolation Method for Design Space ExplorationJournal of Mechanical Design10.1115/1.4054085144:8Online publication date: 8-Apr-2022
  • Show More Cited By

Index Terms

  1. Efficient architectural design space exploration via predictive modeling

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Architecture and Code Optimization
      ACM Transactions on Architecture and Code Optimization  Volume 4, Issue 4
      January 2008
      187 pages
      ISSN:1544-3566
      EISSN:1544-3973
      DOI:10.1145/1328195
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 30 January 2008
      Accepted: 01 March 2007
      Revised: 01 December 2006
      Received: 01 August 2006
      Published in TACO Volume 4, Issue 4

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Artificial neural networks
      2. design space exploration
      3. performance prediction
      4. sensitivity studies

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)123
      • Downloads (Last 6 weeks)31
      Reflects downloads up to 20 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)Phronesis: Efficient Performance Modeling for High-dimensional Configuration TuningACM Transactions on Architecture and Code Optimization10.1145/354686819:4(1-26)Online publication date: 16-Sep-2022
      • (2022)Prediction Modeling for Application-Specific Communication Architecture Design of Optical NoCACM Transactions on Embedded Computing Systems10.1145/352024121:4(1-29)Online publication date: 23-Aug-2022
      • (2022)A Non-Parametric Histogram Interpolation Method for Design Space ExplorationJournal of Mechanical Design10.1115/1.4054085144:8Online publication date: 8-Apr-2022
      • (2021)Toward a general framework for jointly processor-workload empirical modelingThe Journal of Supercomputing10.1007/s11227-020-03475-977:6(5319-5353)Online publication date: 1-Jun-2021
      • (2021)A Power-Aware Hybrid Cache for Chip-Multi Processors Based on Neural Network Prediction TechniqueInternational Journal of Parallel Programming10.1007/s10766-021-00691-549:3(326-346)Online publication date: 1-Jun-2021
      • (2021)Improving power-performance via hybrid cache for chip many cores based on neural network prediction techniqueMicrosystem Technologies10.1007/s00542-020-05048-527:8(2995-3006)Online publication date: 1-Aug-2021
      • (2020)Methods to Optimize Carbon Footprint of Buildings in Regenerative Architectural Design with the Use of Machine Learning, Convolutional Neural Network, and Parametric DesignEnergies10.3390/en1320528913:20(5289)Online publication date: 12-Oct-2020
      • (2020)DeffeProceedings of the 17th ACM International Conference on Computing Frontiers10.1145/3387902.3392633(182-191)Online publication date: 11-May-2020
      • (2020)A Machine Learning Methodology for Cache Memory Design Based on Dynamic InstructionsACM Transactions on Embedded Computing Systems10.1145/337692019:2(1-20)Online publication date: 11-Mar-2020
      • (2020)A Deep Reinforcement Learning Framework for Architectural Exploration: A Routerless NoC Case Study2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA47549.2020.00018(99-110)Online publication date: Feb-2020
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Full Access

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media