skip to main content
10.1145/2597917.2597937acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article

Accurate off-line phase classification for HW/SW co-designed processors

Authors Info & Claims
Published:20 May 2014Publication History

ABSTRACT

Evaluation techniques in microprocessor design are mostly based on simulating selected application's samples using a cycle-accurate simulator. These samples usually correspond to different phases of the application stream. To identify these phases, relevant high-level application statistics are collected and clustered using a process named "Off-Line Phase Classification". The purpose of phase classification is to reduce the number of samples that need to be simulated with the minimum loss in accuracy (compared to simulating the complete set of samples).

Unfortunately, when directly applied to HW/SW co-designed processors the traditional phase classifications do not provide a good trade-off between accuracy and the number of samples. As an example, according to our experimental results, to achieve a 4% error (compared to simulating all the samples) one needs to simulate 2.5X more samples for the case of HW/SW co-designed processors compared to what is necessary for HW-only processors.

In this paper, we propose a novel off-line phase classification scheme called TOL Description Vector (TDV), which is suitable for HW/SW co-designed processors. TDV targets at estimating the TOL particularities and on average gives significantly better accuracy than traditional phase classification for any number of selected samples. For instance, TDV reaches the average error of 3% with 3X less samples than traditional classification. These benefits apply for different TOL and microarchitecture configurations.

References

  1. Quick EMUlation tool (http://http://www.qemu.org/).Google ScholarGoogle Scholar
  2. Standard Performance Evaluation Corporation. SPEC CPU2006 Benchmarks. (http://www.spec.org/cpu2006/).Google ScholarGoogle Scholar
  3. M. Annavaram, R. Rakvic, M. Polito, J. Y Bouguet, R. Hankins, and B. Davies. The Fuzzy Correlation between Code and Performance Predictability. In 37th International Symposium on Microarchitecture, pages 93--104, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. E. Argollo, A. Falcon, P. Faraboschi, M. Monchiero, and D. Ortega. Cotson: Infrastructure for full system simulation. SIGOPS Oper. Syst. Rev., 43(1):52--61, January 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: A Transparent Dynamic Optimization System. In Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, PLDI '00, pages 1--12, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Branković, K. Stavrou, E. Gibert, and A. González. Performance Analysis and Predictability of the Software Layer in Dynamic Binary Translators/Optimizers. In Proceedings of the ACM International Conference on Computing Frontiers, CF '13, pages 15:1--15:10, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. E. Carlson, W. Heirman, and L. Eeckhout. Sampled simulation of multi-threaded applications. In Proceedings of the 2013 IEEE International Symposium on Performance Analysis of Systems and Software, pages 2--12, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  8. J. Dehnert, B. Grant, J. Banning, R. Johnson, T. Kistler, A. Klaiber, and J. Mattson. The Transmeta Code Morphing Software: Using Speculation, Recovery, and Adaptive Retranslation to address real-life challenges. In Proceedings of the International Symposium on Code Generation and Optimization, CGO '03, pages 15--24, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. K. Ebcioglu, E. Altman, M. Gschwind, and S. Sathaye. Dynamic Binary Translation and Optimization. IEEE Transactions on Computers, 50(6):529--548, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. K. Ebcioglu and E. R. Altman. Daisy: Dynamic Compilation for 100% Architectural Compatibility. In Proceedings of the 24th annual International Symposium on Computer Architecture, ISCA '97, pages 26--37, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Georges, D. Buytaert, L. Eeckhout, and K. De Bosschere. Method-Level Phase Behavior in Java Workloads. In Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, OOPSLA '04, pages 270--287, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. N. Hardavellas et al. Simex: A Fast, Accurate, Flexible Full-System Simulation Framework for Performance Evaluation of Server Architecture. SIGMETRICS Perform. Eval. Rev., 31(4):31--34, March 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Hauswirth and A. Diwan. Phases in Branch Targets of Java Programs. Technical Report CU-CS-983-04, 2004.Google ScholarGoogle Scholar
  14. J. D. Hiser and D. Williams et al. Evaluating Indirect Branch Handling Mechanisms in Software Dynamic Translation Systems. ACM Trans. Archit. Code Optim., 8(2):9:1--9:28, June 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Hu and J. E. Smith. Reducing Startup Time in Co-Designed Virtual Machines. In Proceedings of the 33rd annual international symposium on Computer Architecture, ISCA '06, pages 277--288, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. Huffmire and T. Sherwood. Wavelet-based phase classification. In Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques, PACT '06, pages 95--104, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. IBM. The PowerPC 440 Core. White-Paper, IBM Microelectronics Division Research Triangle Park NC, 1999.Google ScholarGoogle Scholar
  18. H. Kim and J. E. Smith. Hardware Support for Control Transfers in Code Caches. In Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, MICRO 36, pages 253--, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Klaiber. The Technology Behind the Crusoe Processors. White paper, January 2000.Google ScholarGoogle Scholar
  20. K. Krewell. Transmeta gets more efficeon. Microprocessor Report, 2003.Google ScholarGoogle Scholar
  21. N. Kumar and N. Neelakantam. Indirect Branches in the Transmeta Efficeon Processor. In Proceedings of the 2011 Workshop on Infrastructure for Software/Hardware co-design, WISH '11, 2011.Google ScholarGoogle Scholar
  22. J. Lau, S. Schoemackers, and B. Calder. Structures for phase classification. In Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS '04, pages 57--67, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. Nagpurkar and C. Krintz. Phase-based Visualization and Analysis of Java Programs. In Elsevier Science of Computer Programming, Special issue on Principles of programming in Java, volume 59, Number 1--2, pages 131--164, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. N. Neelakantam, D. Ditzel, and C. Zilles. A Real System Evaluation of Hardware Atomicity for Software Speculation. In Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems, ASPLOS XV, pages 29--38, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. G. Ottoni et al. AstroLIT: enabling simulation-based microarchitecture comparison between Intel and Transmeta designs. In Proceedings of the 8th ACM International Conference on Computing Frontiers, CF '11, pages 21:1--21:2, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. Pavlou, A. Brankovic, R. Kumar, M. Gregori, S. Kyriakos, E. Gibert, and A. Gonzalez. DARCO: Infrastructure for Research on HW/SW co-designed Virtual Machines. In Proceedings of AMAS workshop, in conjuction with ISCA, 2011.Google ScholarGoogle Scholar
  27. D. Pavlou, E. Gibert, F. Latorre, and A. Gonzalez. DDGacc: Boosting Dynamic DDG-based Binary Optimizations through Specialized Hardware Support. In Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments, VEE '12, pages 159--168, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Sathaye et al. BOA: Targeting multi-gigahertz with Binary Translation. In Proceedings of the 1999 Workshop on Binary Translation, IEEE Computer Society Technical Committee on Computer Architecture Newsletter, pages 2--11, 1999.Google ScholarGoogle Scholar
  29. T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically Characterizing Large Scale Program Behavior. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS X, pages 45--57, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Smith and R. Nair. Virtual Machines: Versatile Platforms for Systems and Processes. The Morgan Kaufmann Series in Computer Architecture and Design. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Y. Wu, S. Hu, E. Borin, and C. Wang. A HW/SW co-designed Heterogeneous multi-core Virtual Machine for energy-efficient general purpose computing. In Proceedings of the 2011 IEEE/ACM International Symposium on Code Generation and Optimization, CGO '11, pages 236--245, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. R. Wunderlich, T. Wenisch, B. Falsafi, and J. Hoe. SMARTS: Accelerating Microarchitecture Simulation via Rigorous Statistical sampling. In Proceedings of the 30th annual International Symposium on Computer Architecture, ISCA '03, pages 84--97, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. C. Wung, Y. Wu, and M. Cintra. Acceldroid: Co-designed acceleration of Android bytecode. In Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, CGO '13, pages 1--10, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Accurate off-line phase classification for HW/SW co-designed processors

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CF '14: Proceedings of the 11th ACM Conference on Computing Frontiers
      May 2014
      305 pages
      ISBN:9781450328708
      DOI:10.1145/2597917

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 May 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CF '14 Paper Acceptance Rate28of62submissions,45%Overall Acceptance Rate240of680submissions,35%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader