research-article

Accurate off-line phase classification for HW/SW co-designed processors

Authors:
Aleksandar Branković

Universitat Politècnica de Catalunya, Spain

Universitat Politècnica de Catalunya, Spain
View Profile

,
Kyriakos Stavrou

Intel Barcelona Research Center, Intel Labs Barcelona, Spain

Intel Barcelona Research Center, Intel Labs Barcelona, Spain
View Profile

,
Enric Gibert

Intel Barcelona Research Center, Intel Labs Barcelona, Spain

Intel Barcelona Research Center, Intel Labs Barcelona, Spain
View Profile

,
Antonio González

Universitat Politècnica de Catalunya, Spain and Intel Barcelona Research Center, Intel Labs Barcelona, Spain

Universitat Politècnica de Catalunya, Spain and Intel Barcelona Research Center, Intel Labs Barcelona, Spain
View Profile

CF '14: Proceedings of the 11th ACM Conference on Computing FrontiersMay 2014Article No.: 5Pages 1–10https://doi.org/10.1145/2597917.2597937

Published:20 May 2014Publication History

CF '14: Proceedings of the 11th ACM Conference on Computing Frontiers

Pages 1–10

ABSTRACT

Evaluation techniques in microprocessor design are mostly based on simulating selected application's samples using a cycle-accurate simulator. These samples usually correspond to different phases of the application stream. To identify these phases, relevant high-level application statistics are collected and clustered using a process named "Off-Line Phase Classification". The purpose of phase classification is to reduce the number of samples that need to be simulated with the minimum loss in accuracy (compared to simulating the complete set of samples).

Unfortunately, when directly applied to HW/SW co-designed processors the traditional phase classifications do not provide a good trade-off between accuracy and the number of samples. As an example, according to our experimental results, to achieve a 4% error (compared to simulating all the samples) one needs to simulate 2.5X more samples for the case of HW/SW co-designed processors compared to what is necessary for HW-only processors.

In this paper, we propose a novel off-line phase classification scheme called TOL Description Vector (TDV), which is suitable for HW/SW co-designed processors. TDV targets at estimating the TOL particularities and on average gives significantly better accuracy than traditional phase classification for any number of selected samples. For instance, TDV reaches the average error of 3% with 3X less samples than traditional classification. These benefits apply for different TOL and microarchitecture configurations.

References

Quick EMUlation tool (http://http://www.qemu.org/).Google Scholar
Standard Performance Evaluation Corporation. SPEC CPU2006 Benchmarks. (http://www.spec.org/cpu2006/).Google Scholar
M. Annavaram, R. Rakvic, M. Polito, J. Y Bouguet, R. Hankins, and B. Davies. The Fuzzy Correlation between Code and Performance Predictability. In 37th International Symposium on Microarchitecture, pages 93--104, 2004. Google ScholarDigital Library
E. Argollo, A. Falcon, P. Faraboschi, M. Monchiero, and D. Ortega. Cotson: Infrastructure for full system simulation. SIGOPS Oper. Syst. Rev., 43(1):52--61, January 2009. Google ScholarDigital Library
V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: A Transparent Dynamic Optimization System. In Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, PLDI '00, pages 1--12, 2000. Google ScholarDigital Library
A. Branković, K. Stavrou, E. Gibert, and A. González. Performance Analysis and Predictability of the Software Layer in Dynamic Binary Translators/Optimizers. In Proceedings of the ACM International Conference on Computing Frontiers, CF '13, pages 15:1--15:10, 2013. Google ScholarDigital Library
T. E. Carlson, W. Heirman, and L. Eeckhout. Sampled simulation of multi-threaded applications. In Proceedings of the 2013 IEEE International Symposium on Performance Analysis of Systems and Software, pages 2--12, 2013.Google ScholarCross Ref
J. Dehnert, B. Grant, J. Banning, R. Johnson, T. Kistler, A. Klaiber, and J. Mattson. The Transmeta Code Morphing Software: Using Speculation, Recovery, and Adaptive Retranslation to address real-life challenges. In Proceedings of the International Symposium on Code Generation and Optimization, CGO '03, pages 15--24, 2003. Google ScholarDigital Library
K. Ebcioglu, E. Altman, M. Gschwind, and S. Sathaye. Dynamic Binary Translation and Optimization. IEEE Transactions on Computers, 50(6):529--548, 2001. Google ScholarDigital Library
K. Ebcioglu and E. R. Altman. Daisy: Dynamic Compilation for 100% Architectural Compatibility. In Proceedings of the 24th annual International Symposium on Computer Architecture, ISCA '97, pages 26--37, 1997. Google ScholarDigital Library
A. Georges, D. Buytaert, L. Eeckhout, and K. De Bosschere. Method-Level Phase Behavior in Java Workloads. In Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, OOPSLA '04, pages 270--287, 2004. Google ScholarDigital Library
N. Hardavellas et al. Simex: A Fast, Accurate, Flexible Full-System Simulation Framework for Performance Evaluation of Server Architecture. SIGMETRICS Perform. Eval. Rev., 31(4):31--34, March 2004. Google ScholarDigital Library
M. Hauswirth and A. Diwan. Phases in Branch Targets of Java Programs. Technical Report CU-CS-983-04, 2004.Google Scholar
J. D. Hiser and D. Williams et al. Evaluating Indirect Branch Handling Mechanisms in Software Dynamic Translation Systems. ACM Trans. Archit. Code Optim., 8(2):9:1--9:28, June 2011. Google ScholarDigital Library
S. Hu and J. E. Smith. Reducing Startup Time in Co-Designed Virtual Machines. In Proceedings of the 33rd annual international symposium on Computer Architecture, ISCA '06, pages 277--288, 2006. Google ScholarDigital Library
T. Huffmire and T. Sherwood. Wavelet-based phase classification. In Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques, PACT '06, pages 95--104, 2006. Google ScholarDigital Library
IBM. The PowerPC 440 Core. White-Paper, IBM Microelectronics Division Research Triangle Park NC, 1999.Google Scholar
H. Kim and J. E. Smith. Hardware Support for Control Transfers in Code Caches. In Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, MICRO 36, pages 253--, 2003. Google ScholarDigital Library
A. Klaiber. The Technology Behind the Crusoe Processors. White paper, January 2000.Google Scholar
K. Krewell. Transmeta gets more efficeon. Microprocessor Report, 2003.Google Scholar
N. Kumar and N. Neelakantam. Indirect Branches in the Transmeta Efficeon Processor. In Proceedings of the 2011 Workshop on Infrastructure for Software/Hardware co-design, WISH '11, 2011.Google Scholar
J. Lau, S. Schoemackers, and B. Calder. Structures for phase classification. In Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS '04, pages 57--67, 2004. Google ScholarDigital Library
P. Nagpurkar and C. Krintz. Phase-based Visualization and Analysis of Java Programs. In Elsevier Science of Computer Programming, Special issue on Principles of programming in Java, volume 59, Number 1--2, pages 131--164, 2006. Google ScholarDigital Library
N. Neelakantam, D. Ditzel, and C. Zilles. A Real System Evaluation of Hardware Atomicity for Software Speculation. In Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems, ASPLOS XV, pages 29--38, 2010. Google ScholarDigital Library
G. Ottoni et al. AstroLIT: enabling simulation-based microarchitecture comparison between Intel and Transmeta designs. In Proceedings of the 8th ACM International Conference on Computing Frontiers, CF '11, pages 21:1--21:2, 2011. Google ScholarDigital Library
D. Pavlou, A. Brankovic, R. Kumar, M. Gregori, S. Kyriakos, E. Gibert, and A. Gonzalez. DARCO: Infrastructure for Research on HW/SW co-designed Virtual Machines. In Proceedings of AMAS workshop, in conjuction with ISCA, 2011.Google Scholar
D. Pavlou, E. Gibert, F. Latorre, and A. Gonzalez. DDGacc: Boosting Dynamic DDG-based Binary Optimizations through Specialized Hardware Support. In Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments, VEE '12, pages 159--168, 2012. Google ScholarDigital Library
S. Sathaye et al. BOA: Targeting multi-gigahertz with Binary Translation. In Proceedings of the 1999 Workshop on Binary Translation, IEEE Computer Society Technical Committee on Computer Architecture Newsletter, pages 2--11, 1999.Google Scholar
T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically Characterizing Large Scale Program Behavior. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS X, pages 45--57, 2002. Google ScholarDigital Library
J. Smith and R. Nair. Virtual Machines: Versatile Platforms for Systems and Processes. The Morgan Kaufmann Series in Computer Architecture and Design. 2005. Google ScholarDigital Library
Y. Wu, S. Hu, E. Borin, and C. Wang. A HW/SW co-designed Heterogeneous multi-core Virtual Machine for energy-efficient general purpose computing. In Proceedings of the 2011 IEEE/ACM International Symposium on Code Generation and Optimization, CGO '11, pages 236--245, 2011. Google ScholarDigital Library
R. Wunderlich, T. Wenisch, B. Falsafi, and J. Hoe. SMARTS: Accelerating Microarchitecture Simulation via Rigorous Statistical sampling. In Proceedings of the 30th annual International Symposium on Computer Architecture, ISCA '03, pages 84--97, 2003. Google ScholarDigital Library
C. Wung, Y. Wu, and M. Cintra. Acceldroid: Co-designed acceleration of Android bytecode. In Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, CGO '13, pages 1--10, 2013. Google ScholarDigital Library

Index Terms

Accurate off-line phase classification for HW/SW co-designed processors
1. General and reference
  1. Cross-computing tools and techniques
    1. Design

Recommendations

Warm-Up Simulation Methodology for HW/SW Co-Designed Processors
CGO '14: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization

Evaluation techniques in microprocessor design are mostly based on simulating selected application samples using a cycle-accurate simulator. In order to achieve accurate results, microarchitectural structures are warmed-up for a few million instructions ...
Read More
Warm-Up Simulation Methodology for HW/SW Co-Designed Processors
CGO '14: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization

Evaluation techniques in microprocessor design are mostly based on simulating selected application samples using a cycle-accurate simulator. In order to achieve accurate results, microarchitectural structures are warmed-up for a few million instructions ...
Read More
Domain-Specific Language for HW/SW Co-design for FPGAs
DSL '09: Proceedings of the IFIP TC 2 Working Conference on Domain-Specific Languages

This article describes FSMLanguage, a domain-specific language for HW/SW co-design targeting platform FPGAs. Modern platform FPGAs provide a wealth of configurable logic in addition to embedded processors, distributed RAM blocks, and DSP slices in order ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CF '14: Proceedings of the 11th ACM Conference on Computing Frontiers
May 2014
305 pages
ISBN:9781450328708
DOI:10.1145/2597917
General Chair:
Pedro Trancoso
University of Cyprus, CY
,
Program Chairs:
Diana Franklin
University of California at Santa Barbara
,
Sally A. McKee
Chalmers University of Technology, SE
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 May 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
HW/SW co-designed processors
dynamic binary translation
simulation
warm-up methodology
Qualifiers
- research-article
Conference

Acceptance Rates
CF '14 Paper Acceptance Rate28of62submissions,45%Overall Acceptance Rate240of680submissions,35%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 96
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Accurate off-line phase classification for HW/SW co-designed processors

CF '14: Proceedings of the 11th ACM Conference on Computing Frontiers

ABSTRACT

References

Cited By

Index Terms

Recommendations

Warm-Up Simulation Methodology for HW/SW Co-Designed Processors

Warm-Up Simulation Methodology for HW/SW Co-Designed Processors

Domain-Specific Language for HW/SW Co-design for FPGAs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Accurate off-line phase classification for HW/SW co-designed processors

CF '14: Proceedings of the 11th ACM Conference on Computing Frontiers

ABSTRACT

References

Cited By

Index Terms

Recommendations

Warm-Up Simulation Methodology for HW/SW Co-Designed Processors

Warm-Up Simulation Methodology for HW/SW Co-Designed Processors

Domain-Specific Language for HW/SW Co-design for FPGAs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media