article

Methodology for operation shuffling and L0 cluster generation for low energy heterogeneous VLIW processors

Authors:
Yuki Kobayashi

Graduate School of Information Science and Technology, Osaka University, Osaka, Japan

Graduate School of Information Science and Technology, Osaka University, Osaka, Japan
View Profile

,
Murali Jayapala

IMEC vzw., Leuven, Belgium

IMEC vzw., Leuven, Belgium
View Profile

,
Praveen Raghavan

IMEC vzw., Katholieke Universitait Leuven, Leuven, Belgium

IMEC vzw., Katholieke Universitait Leuven, Leuven, Belgium
View Profile

,
Francky Catthoor

IMEC vzw., Katholieke Universitait Leuven, Leuven, Belgium

IMEC vzw., Katholieke Universitait Leuven, Leuven, Belgium
View Profile

,
Masaharu Imai

Graduate School of Information Science and Technology, Osaka University, Osaka, Japan

Graduate School of Information Science and Technology, Osaka University, Osaka, Japan
View Profile

ACM Transactions on Design Automation of Electronic Systems Volume 12 Issue 4pp 41–eshttps://doi.org/10.1145/1278349.1278354

Published:01 September 2007Publication History

ACM Transactions on Design Automation of Electronic Systems

Abstract

Clustering L0 buffers is effective for energy reduction in the instruction memory hierarchy of embedded VLIW processors. However, the efficiency of the clustering depends on the schedule of the target application. Especially in heterogeneous or data clustered VLIW processors, determining energy efficient scheduling is more constraining.

This article proposes a realistic technique supported by a tool flow to explore operation shuffling for improving generation of L0 clusters. The tool flow explores assignment of operations for each cycle and generates various schedules. This approach makes it possible to reduce energy consumption for various processor architectures. However, the computational complexity is large because of the huge exploration space. Therefore, some heuristics are also developed, which reduce the size of the exploration space while the solution quality remains reasonable. Furthermore, we also propose a technique to support VLIW processors with multiple data clusters, which is essential to apply the methodology to real world processors.

The experimental results indicate potential gains of up to 27.6% in energy in L0 buffers, through operation shuffling for heterogeneous processor architectures as well as a homogeneous architecture. Furthermore, the proposed heuristics drastically reduce the exploration search space by about 90%, while the results are comparable to full search, with average differences of less than 1%. The experimental results indicate that energy efficiency can be improved in most of the media benchmarks by the proposed methodology, where the average gain is around 10% in comparison with generating clusters without operation shuffling.

References

Bajwa, R. S., Hiraki, M., Kojima, H., Gorny, D. J., Nitta, K., Shridhar, A., Seki, K., and Sasaki, K. 1997. Instruction buffering to reduce power in processors for signal processing. IEEE Trans. VLSI Syst. 5, 4 (Dec.), 417--424. Google ScholarDigital Library
Benini, L., Bruni, D., Chinosi, M., Silvano, C., Zaccaria, V., and Zafalon, R. 2001. A power modeling and estimation framework for VLIW-based embedded systems. In Proceedings of the IEEE International Workshop on Power And Timing Modeling, Optimization and Simulation, Yverdon-Les-Bains, IEEE. Switzerland.Google Scholar
Bona, A., Sami, M., Sciuto, D., Silvano, C., Zaccaria, V., and Zafalon, R. 2002a. Energy estimation and optimization of embedded VLIW processors based on instruction clustering. In Design Automation Conference. New Orleans, LO. 886--891. Google ScholarDigital Library
Bona, A., Sami, M., Sciuto, D., Silvano, C., Zaccaria, V., and Zafalon, R. 2002b. An instruction-level methodology for power estimation and optimization of embedded VLIW cores. In Proceedings of the Design, Automation and Test in Europe. Paris, France, 1128. Google ScholarDigital Library
Brooks, D., Tiwari, V., and Martonosi, M. 2000. Wattch: A framework for architectural-level power analysis and optimizations. In Proceedings of the International Symposium on Computer Architecture. Vancouver, BC, 83--94. Google ScholarDigital Library
Clear Speed. http://www.clearspeed.com/.Google Scholar
de Beeck, P. O., Barat, F., Jayapala, M., and Lauwereins, R. 2001. CRISP: A template for reconfigurable instruction set processors. In Proceedings of the International Conference on Field Programmable Logic and Applications. Belfast, Ireland, 296--305. Google ScholarDigital Library
Faraboschi, P., Brown, G., Fisher, J. A., Desoli, G., and Homewood, F. 2000. Lx: A technology platform for customizable VLIW embedded processing. In Proceedings of the International Symposium on Computer Architecture. Vancouver, Canada, 203--213. Google ScholarDigital Library
Gangwar, A., Balakrishnan, M., Panda, P. R., and Kumar, A. 2005. Evaluation of bus based interconnect mechanisms in clustered VLIW architectures. In Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe. Munich, Germany, 730--735. Google ScholarDigital Library
Gordon-Ross, A. and Vahid, F. 2005. Frequent loop detection using efficient nonintrusive on-chip hardware. IEEE Trans. Comput. 54, 10 (Oct.), 1203--1215. Google ScholarDigital Library
Jacome, M. F. and de Veciana, G. 2000. Design challenges for new application-specific processors. IEEE Design & Test Comput. 17, 2, 40--50. Google ScholarDigital Library
Jayapala, M., Barat, F., Vander Aa, T., Catthoor, F., Corporaal, H., and Deconinck, G. 2005. Clustered loop buffer organization for low energy VLIW embedded processors. IEEE Trans. Comput. 54, 6 (June), 672--683. Google ScholarDigital Library
Jayapala, M., Vander Aa, T., Barat, F., Catthoor, F., Coporaal, H., and Deconinck, G. 2004. L0 cluster synthesis and operation shuffling. In Proceedings of the IEEE International Workshop on Power And Timing Modeling, Optimization and Simulation. Santorini, Greece. IEEE, 311--321.Google Scholar
Lambrechts, A., Raghavan, P., Leroy, A., Talavera, G., VanderAa, T., Jayapala, M., Catthoor, F., Verkest, D., Deconinck, G., Coporaal, H., Robert, F., and Carrabina, J. 2005. Power breakdown analysis for a heterogeneous NoC platform running a video application. In Proceedings of the IEEE 16th International Conference on Application-Specific Systems, Architectures and Processors. Samos, Greece, 179--184. Google ScholarDigital Library
Lambrechts, A., Vander Aa, T., Jayapala, M., Talavera, G., Leroy, A., Shickova, A., Barat, F., Mei, B., Catthoor, F., Verkest, D., Deconinck, G., Corporaal, H., Robert, F., and Bordoll, J. C. 2004. Design style case study for embedded multimedia compute nodes. In Proceedings of the Real-Time Systems Symposium. 104--113. Google ScholarDigital Library
Lee, L. H., Moyer, B., and Arends, J. 1999. Instruction fetch energy reduction using loop caches for embedded applications with small tight loops. In Proceedings of the International Symposium on Low Power Electronic Design. San Diego, CA, 267--269. Google ScholarDigital Library
MediaBench. http://cares.icsl.ucla.edu/MediaBench/.Google Scholar
Rixner, S., Dally, W. J., Khailany, B., Mattson, P., Kapasi, U. J., and Owens, J. D. 2000. Register organization for media processing. In Proceedings of the International Symposium on High-Performance Computer Architecture. Toulouse, France, 375--386.Google Scholar
Scarpazza, D. P., Raghavan, P., Novo, D., Catthoor, F., and Verkest, D. 2006. Software simultaneous multi-threading, a technique to exploit task-level parallelism to improve instruction- and data-level parallelism. In Proceedings of the Power and Timing Modeling, Optimization and Simulation. Montpellier, France, Springer Verlag, 12--23. Google ScholarDigital Library
Silicon Hive. http://www.silicon-hive.com/.Google Scholar
Suresh, D. C., Najjar, W. A., Vahid, F., Villarreal, J. R., and Stitt, G. 2003. Profiling tools for hardware/software partitioning of embedded applications. In Proceedings of the Language, Compiler and Tool Support for Embedded Systems. San Diego, CA. 189--198. Google ScholarDigital Library
Texas Instruments. 2000. TMS320C6000 CPU and Instruction Set Reference Guide.Google Scholar
Trimaran. Trimaran: An infrastructure for research in instruction-level parallelism. http://www.trimaran.org/.Google Scholar
Vander Aa, T., Jayapala, M., Barat, F., Deconinck, G., Lauwereins, R., Catthoor, F., and Coporaal, H. 2004. Instruction buffering exploration for low energy VLIW with instruction clusters. In Proceedings of the IEEE Asia and South Pacific Design Automation Conference. Yokohama, Japan, IEEE, 825--830. Google ScholarDigital Library

Index Terms

Methodology for operation shuffling and L0 cluster generation for low energy heterogeneous VLIW processors
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Efficient Method to Generate an Energy Efficient Schedule Using Operation Shuffling

Clustering L0 buffers is effective for energy reduction in the instruction memory caches of embedded VLIW processors. However, the efficiency of the clustering depends on the schedule of the target application. For improving the energy efficiency of L0 ...
Read More
Machine-Description Driven Compilers for EPIC and VLIW Processors

In the past, due to the restricted gate count available on an inexpensive chip, embedded DSPs have had limited parallelism, few registers and irregular, incomplete interconnectivity. More recently, with increasing levels of integration, embedded VLIW ...
Read More
Branch prediction techniques for low-power VLIW processors
GLSVLSI '03: Proceedings of the 13th ACM Great Lakes symposium on VLSI

Main goal of the paper is to introduce a branch prediction scheme suitable for energy-efficient VLIW (Very Long Instruction Word) processors aiming at reducing the energy associated with the prediction phase by filtering the accesses to the branch ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Design Automation of Electronic Systems Volume 12, Issue 4
September 2007
449 pages
ISSN:1084-4309
EISSN:1557-7309
DOI:10.1145/1278349
Issue’s Table of Contents

Copyright © 2007 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States

Journal Family
ACM Journals for the Design of Smart and Connected Systems
Publication History
- Published: 1 September 2007
Published in todaes Volume 12, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Compilers for low energy
VLIW processors
loop buffers
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 251
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Methodology for operation shuffling and L0 cluster generation for low energy heterogeneous VLIW processors

ACM Transactions on Design Automation of Electronic Systems

Abstract

References

Cited By

Index Terms

Recommendations

Efficient Method to Generate an Energy Efficient Schedule Using Operation Shuffling

Machine-Description Driven Compilers for EPIC and VLIW Processors

Branch prediction techniques for low-power VLIW processors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Journal Family

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Methodology for operation shuffling and L0 cluster generation for low energy heterogeneous VLIW processors

ACM Transactions on Design Automation of Electronic Systems

Abstract

References

Cited By

Index Terms

Recommendations

Efficient Method to Generate an Energy Efficient Schedule Using Operation Shuffling

Machine-Description Driven Compilers for EPIC and VLIW Processors

Branch prediction techniques for low-power VLIW processors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Journal Family

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media