skip to main content
10.1145/2581122.2544142acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
tutorial

Warm-Up Simulation Methodology for HW/SW Co-Designed Processors

Authors Info & Claims
Published:15 February 2014Publication History

ABSTRACT

Evaluation techniques in microprocessor design are mostly based on simulating selected application samples using a cycle-accurate simulator. In order to achieve accurate results, microarchitectural structures are warmed-up for a few million instructions prior to statistics collection. Unfortunately, this strategy cannot be applied to HW/SW co-designed processors, in which a Transparent Optimization software Layer (TOL) translates and optimizes code on-the-fly from a guest ISA to an internal host custom microarchitecture. We show that the warm-up period in this case needs to be 3-4 orders of magnitude longer than what is needed for traditional microprocessor designs because the TOL state needs to be warmed-up as well.

In this paper, we propose a novel simulation technique for HW/SW co-designed processors based on adapting the optimization promotion thresholds using high level application statistics in order to find the best trade-off between accuracy and simulation cost. In particular, the proposed technique reduces the simulation cost by 65X with an average error of just 0.75%. Furthermore, as opposed to other alternatives, the proposed technique satisfies the additional requirement of allowing evaluation using different TOL and microarchitectural configurations.

References

  1. PIN instrumentation tool (http://www.pintool.org/).Google ScholarGoogle Scholar
  2. Quick EMUlation tool (http://http://www.qemu.org/).Google ScholarGoogle Scholar
  3. Standard Performance Evaluation Corporation. SPEC CPU2006 Benchmarks. (http://www.spec.org/cpu2006/).Google ScholarGoogle Scholar
  4. V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: A Transparent Dynamic Optimization System. In Proceedings of the ACM SIGPLAN 2000 conference on Programming Language Design and Implementation, PLDI '00, pages 1--12, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Dehnert, B. Grant, J. Banning, R. Johnson, T. Kistler, A. Klaiber, and J. Mattson. The Transmeta Code Morphing Software: Using Speculation, Recovery, and Adaptive Retranslation to Address Real-Life Challenges. In Proceedings of the International Symposium on Code Generation and Optimization, CGO '03, pages 15--24, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. K. Ebcioglu, E. Altman, M. Gschwind, and S. Sathaye. Dynamic Binary Translation and Optimization. IEEE Transactions on Computers, 50(6):529--548, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. K. Ebcioglu and E. R. Altman. Daisy: Dynamic Compilation for 100% Architectural Compatibility. In Proceedings of the 24th annual International Symposium on Computer Architecture, ISCA '97, pages 26--37, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. L. Eeckhout, Y. Luo, K. De Bosschere, and L. K. John. BLRL: Accurate and Efficient Warmup for Sampled Processor Simulation. The Computer Journal, vol 48, pages 451--459, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Falcon, P. Faraboschi, and D. Ortega. Combining Simulation and Virtualization through Dynamic Sampling. In Proceedings of the 2007 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS '07, pages 72--83, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  10. A. Georges, D. Buytaert, and L. Eeckhout. Statistically Rigorous Java Performance Evaluation. In Proceedings of the 22nd annual ACM SIGPLAN Conference on Object-Oriented Programming Systems and Applications, OOPSLA '07, pages 57--76, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Georges, L. Eeckhout, and D. Buytaert. Java Performance Evaluation through Rigorous Replay Compilation. In Proceedings of the 23rd ACM SIGPLAN Conference on Object-Oriented Programming Systems Languages and Applications, OOPSLA '08, pages 367--384, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. N. Hardavellas, S. Somogyi, T. Wenisch, E. Wunderlich, S. Chen, J. Kim, B. Falsafi, J. Hoe, and A. Nowatzyk. Simflex: A Fast, Accurate, Flexible Full-System Simulation Framework for Performance Evaluation of Server Architecture. SIGMETRICS Perform. Eval. Rev., 31(4):31--34, March 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. W. Haskins Jr. and K. Skadron. Minimal Subset Evaluation: Rapid Warm-Up for Simulated Hardware State. In Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors, ICCD '01, pages 32--39, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. W. Haskins Jr and K. Skadron. Memory Reference Reuse Latency: Accelerated Warmup for Sampled Microarchitecture Simulation. In Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS '03, pages 195--203, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. D. Hiser, D. Williams, Wei Hu, J. Davidson, J. Mars, and B. Childers. Evaluating Indirect Branch Handling Mechanisms in Software Dynamic Translation Systems. ACM Trans. Archit. Code Optim., 8(2):9:1--9:28, June 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Hu and J. E. Smith. Reducing Startup Time in Co-Designed Virtual Machines. In Proceedings of the 33rd annual International Symposium on Computer Architecture, ISCA '06, pages 277--288, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. X. Huang, S. M. Blackburn, K. S. McKinley, J. E. B. Moss, Z. Wang, and P. Cheng. The Garbage Collection Advantage: Improving Program Locality. In Proceedings of the 19th annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA '04, pages 69--80, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. IBM. The PowerPC 440 Core. White-Paper, IBM Microelectronics Division Research Triangle Park NC, 1999.Google ScholarGoogle Scholar
  19. H. Kim and J. E. Smith. Hardware Support for Control Transfers in Code Caches. In Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, MICRO 36, pages 253--264, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Klaiber. The Technology Behind the Crusoe Processors. White paper, January 2000.Google ScholarGoogle Scholar
  21. S. Kluyskens and L. Eeckhout. Branch History Matching: Branch Predictor Warmup for Sampled Simulation. In Proceedings of the 2nd International Conference on High Performance Embedded Architectures and compilers, HiPEAC'07, pages 153--167, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. K. Krewell. Transmeta Gets More Efficeon. Microprocessor Report, 2003.Google ScholarGoogle Scholar
  23. N. Kumar and N. Neelakantam. Indirect Branches in the Transmeta Efficeon Processor. In Proceedings of the 2011 Workshop on Infrastructure for Software/Hardware Co-Design, WISH '11, 2011.Google ScholarGoogle Scholar
  24. M. Lupon, E. Gibert, G. Magklis, S. Samudrala, R. Martínez, K. Stavrou, and D. Ditzel. Speculative Hardware/Software Co-designed FP Multiply-Add. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XIX, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M.C. Merten, A.R. Trick, C.N. George, J.C. Gyllenhaal, and W.-M.W. Hwu. A Hardware-Driven Profiling Scheme for Identifying Program Hot Spots to Support Runtime Optimization. In Proceedings of the 26th International Symposium on Computer Architecture, 1999., ISCA '99, pages 136--148, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. N. Neelakantam, D. Ditzel, and C. Zilles. A Real System Evaluation of Hardware Atomicity for Software Speculation. In Proceedings of the fifteenth edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems, ASPLOS XV, pages 29--38, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. G. Ottoni, G. Chinya, G. Hoflehner, J. Collins, A. Kumar, E. Schuchman, D. Ditzel, R. Singhal, and H. Wang. AstroLIT: Enabling Simulation-Based Microarchitecture Comparison between Intel and Transmeta Designs. In Proceedings of the 8th ACM International Conference on Computing Frontiers, CF '11, pages 21:1--21:2, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. Pavlou, A. Brankovic, R. Kumar, M. Gregori, K. Stavrou, E. Gibert, and A. Gonzalez. DARCO: Infrastructure for Research on HW/SW co-designed Virtual Machines. In Proceedings of the 4th AMAS-BT workshop, held in conjunction with ISCA, 2011.Google ScholarGoogle Scholar
  29. D. Pavlou, A. Brankovic, R. Kumar, K. Stavrou, E. Gibert, and A. Gonzalez. Quantitative Characterization of the Software Layer of a State-Of-The-Art Co-Designed Virtual Machine. Technical Report. Universitat Politècnica de Catalunya, Spain, 2012.Google ScholarGoogle Scholar
  30. D. Pavlou, E. Gibert, F. Latorre, and A. Gonzalez. DDGacc: Boosting Dynamic DDG-based Binary Optimizations through Specialized Hardware Support. In Proceedings of the 8th ACM SIGPLAN/SIGOPS Conference on Virtual Execution Environments, VEE '12, pages 159--168, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. N. Sachindran and J. E. B. Moss. Mark-copy: Fast Copying GC with Less Space Overhead. In Proceedings of the 18th annual ACM SIGPLAN Conference on Object-Oriented Programing, Systems, Languages, and Applications, OOPSLA '03, pages 326--343, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S. Sathaye et al. BOA: Targeting Multi-Gigahertz with Binary Translation. In Proceedings of the 1999 Workshop on Binary Translation, IEEE Computer Society Technical Committee on Computer Architecture Newsletter, pages 2--11, 1999.Google ScholarGoogle Scholar
  33. T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically Characterizing Large Scale Program Behavior. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS X, pages 45--57, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Smith and R. Nair. Virtual Machines: Versatile Platforms for Systems and Processes. The Morgan Kaufmann Series in Computer Architecture and Design. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. C. Wang, Y. Wu, and M. Cintra. Acceldroid: Co-designed Acceleration of Android Bytecode. In Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, CGO '13, pages 1--10, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Y. Wu, Shiliang Hu, E. Borin, and Cheng Wang. A HW/SW Co-Designed Heterogeneous Multi-Core Virtual Machine for Energy-Efficient General Purpose Computing. In Proceedings of the 2011 IEEE/ACM International Symposium on Code Generation and Optimization, CGO '11, pages 236--245, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. R. Wunderlich, T. Wenisch, B. Falsafi, and J. Hoe. SMARTS: Accelerating Microarchitecture Simulation via Rigorous Statistical sampling. In Proceedings of the 30th annual International Symposium on Computer Architecture, ISCA '03, pages 84--97, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. J. J. Yi, S. Kodakara, R. Sendag, D. Lilja, and D. Hawkins. Characterizing and Comparing Prevailing Simulation Techniques. In Proceedings of the International Symposium on High Performance Computer Architecture, HPCA '05, pages 266--277, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Warm-Up Simulation Methodology for HW/SW Co-Designed Processors

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CGO '14: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
          February 2014
          328 pages
          ISBN:9781450326704
          DOI:10.1145/2581122

          Copyright © 2014 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 15 February 2014

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • tutorial
          • Research
          • Refereed limited

          Acceptance Rates

          CGO '14 Paper Acceptance Rate29of100submissions,29%Overall Acceptance Rate312of1,061submissions,29%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader