Article

Vector LLVA: a virtual vector instruction set for media processing

Authors:

Robert L. Bocchino, Jr.,

Vikram S. AdveAuthors Info & Claims

VEE '06: Proceedings of the 2nd international conference on Virtual execution environments

Pages 46 - 56

https://doi.org/10.1145/1134760.1134769

Published: 14 June 2006 Publication History

Abstract

We present Vector LLVA, a virtual instruction set architecture (VISA) that exposes extensive static information about vector parallelism while avoiding the use of hardware-specific parameters. We provide both arbitrary-length vectors (for targets that allow vectors of arbitrary length, or where the target length is not known) and fixed-length vectors (for targets that have a fixed vector length, such as subword SIMD extensions), together with a rich set of operations on both vector types. We have implemented translators that compile (1) Vector LLVA written with arbitrary-length vectors to the Motorola RSVP architecture and (2) Vector LLVA written with fixed-length vectors to both AltiVec and Intel SSE2. Our translatorgenerated code achieves speedups competitive with handwritten native code versions of several benchmarks on all three architectures. These experiments show that our V-ISA design captures vector parallelism for two quite different classes of architectures and provides virtual object code portability within the class of subword SIMD architectures.

References

[1]

V. Adve, C. Lattner, M. Brukman, A. Shukla, and B. Gaeke. LLVA: A Low-Level Virtual Instruction Set Architecture. In Proc. ACM/IEEE Int'l Symp. on Microarchitecture (MICRO), pages 205--216, San Diego, CA, Dec. 2003.

Digital Library

[2]

R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures. Morgan Kaufmann Publishers, Inc., San Francisco, CA, 2002.

Digital Library

[3]

Apple Computer, Inc. AltiVec/SSE Migration Guide. http://developer.apple.com/documentation/Performance/VelocityEngine-date.html, 2005.

[4]

L. Baumstark, Jr., and L. Wills. Exposing Data-Level Parallelism in Sequential Image Processing Algorithms. In Proc. Working Conf. on Reverse Engineering (WCRE), 2002.

Digital Library

[5]

A. J. Bik. The Software Vectorization Handbook: Applying Multimedia Extensions for Maximum Performance. Intel Press, 2004.

Digital Library

[6]

G. E. Blelloch and S. Chatterjee. VCODE: A Data-Parallel Intermediate Language. In Proc. Symp. on the Frontiers of Massively Parallel Computation, pages 471--480, Oct. 1990.

[7]

G. Cheong and M. Lam. An Optimizer for Multimedia Instruction Sets. In Proc. Second SUIF Compiler Workshop, 1997.

[8]

S. Ciricescu, R. Essick, B. Lucas, P. May, K. Moat, J. Norris, M. Schuette, and A. Saidi. The Reconfigurable Streaming Vector Processor (RSVP). In Proc. ACM/IEEE Int'l Symp. on Microarchitecture (MICRO). IEEE Computer Society, Dec. 2003.

Digital Library

[9]

K. Diefendorff, P. K. Dubey, R. Hochsprung, and H. Scales. AltiVec Extension to PowerPC Accelerates Media Processing. In Proc. ACM/IEEE Int'l Symp. on Microarchitecture (MICRO), 2000.

Digital Library

[10]

A. Eichenberger, P. Wu, and K. O'Brien. Vectorization for SIMD Architectures with Alignment Constraints. In Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI), 2004.

Digital Library

[11]

R. Fisher and H. Dietz. Compiling for SIMD Within a Register. In Proc. Int'l Workshop on Languages and Compilers for Parallel Computing (LCPC), 1998.

Digital Library

[12]

J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy. Introduction to the Cell Multiprocessor. IBM Journal of Research and Development, 49(4/5):589--604, 2005.

Digital Library

[13]

U. J. Kapasi, S. Rixner, W. J. Dally, B. Khailany, J. H. Ahn, P. Mattson, and J. D. Owens. Programmable Stream Processors. IEEE Computer, pages 54--62, Aug. 2003.

Digital Library

[14]

A. Kudriavtsev and P. Kogge. Generation of Permutations for SIMD Processors. In Conf. on Language, Compiler, and Tool Support for Embedded Systems (LCTES), 2005.

Digital Library

[15]

F. Labonte, P. Mattson, I. Buck, C. Kozyrakis, and M. Horowitz. The Stream Virtual Machine. In Proc. Int'l Conf. on Parallel Architectures and Compilation Techniques (PACT), 2004.

Digital Library

[16]

S. Larsen and S. Amarasinghe. Exploiting Superword Level Parallelism with Multimedia Instruction Sets. In Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI), 2000.

Digital Library

[17]

S. Larsen, E. Witchel, and S. Amarasinghe. Increasing and Detecting Memory Address Congruence. In Proc. Int'l Conf. on Parallel Architectures and Compilation Techniques (PACT), 2002.

Digital Library

[18]

C. Lattner and V. Adve. LLVM: A Compilation Framework for Lifelong Program Analysis and Transformation. In Proc. Int'l Symp. on Code Generation and Optimization (CGO), San Jose, Mar 2004.

Digital Library

[19]

T. Lindholm and F. Yellin. The Java Virtual Machine Specification. Addison-Wesley, Reading, MA, 1997.

Digital Library

[20]

P. R. Mattson. A Programming System for the Imagine Media Processor. PhD thesis, Computer Science Dept., Stanford University, 2002.

Digital Library

[21]

E. Meijer and J. Gough. A Technical Overview of the Common Language Infrastructure. http://research.microsoft.com/ meijer, 2002.

[22]

G. Ren, P. Wu, and D. Padua. An Empirical Study on the Vectorization of Multimedia Applications for Multimedia Extensions. In Proc. Int'l Parallel and Distributed Processing Symp., 2005.

Digital Library

[23]

B. Serebrin, J. D. Owens, C. H. Chen, S. P. Crago, U. J. Kapasi, B. Khailany, P. Mattson, J. Namkoong, S. Rixner, and W. J. Dally. A Stream Processor Development Platform. In Proc. Int'l Conf. on Computer Design (CDES), 2002.

Digital Library

[24]

J. Shin, J. Chame, and M. Hall. Exploiting Superword-Level Locality in Multimedia Extension Architectures. Journal of Instruction-Level Parallelism, 31(5):1--28, 2003.

[25]

W. Thies, M. Karczmarek, and S. Amarasinghe. StreamIt: A Language for Streaming Applications. In Proc. Int'l Conf. on Compiler Construction (CC), 2002.

Digital Library

[26]

P. Wu, A. Eichenberger, and A. Wang. Efficient SIMD Code Generation for Runtime Alignment and Length Conversion. In Proc. Int'l Symp. on Code Generation and Optimization (CGO), 2005.

Digital Library

[27]

P. Wu, A. Eichenberger, A. Wang, and P. Zhao. An Integrated Simdization Framework Using Virtual Vectors. In Proc. Int'l Conf. on Supercomputing (ICS), 2005.

Digital Library

[28]

J. Xiong, J. Johnson, R. Johnson, and D. Padua. SPL: A Language and Compiler for DSP Algorithms. In Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI), 2001.

Digital Library

Cited By

Johnson EDharsee KCriswell JSartor JNaik MRossbach C(2019)Secure guest virtual machine support in apparitionProceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3313808.3313809(17-30)Online publication date: 14-Apr-2019
https://dl.acm.org/doi/10.1145/3313808.3313809
Şuşu A(2019)Compiling Efficiently with Arithmetic Emulation for the Custom-Width Connex Vector ProcessorProceedings of the 5th Workshop on Programming Models for SIMD/Vector Processing10.1145/3303117.3306166(1-8)Online publication date: 16-Feb-2019
https://dl.acm.org/doi/10.1145/3303117.3306166
Bo PPottmann HKilian MWang WWallner J(2018)Circular arc structuresACM Transactions on Graphics (TOG)10.1145/2010324.196499630:4(1-12)Online publication date: 23-Sep-2018
https://dl.acm.org/doi/10.1145/2010324.1964996
Show More Cited By

Index Terms

Vector LLVA: a virtual vector instruction set for media processing
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

ALP: Efficient support for all levels of parallelism for complex media applications

The real-time execution of contemporary complex media applications requires energy-efficient processing capabilities beyond those of current superscalar processors. We observe that the complexity of contemporary media applications requires support for ...
Efficient multimedia coprocessor with enhanced SIMD engines for exploiting ILP and DLP

Multimedia applications have become increasingly important in daily computing. These applications are composed of heterogeneous regions of code mixed with data-level parallelism (DLP) and instruction-level parallelism (ILP). A standard solution for a ...
Vector Extensions for Decision Support DBMS Acceleration
MICRO-45: Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture

Database management systems (DBMS) have become an essential tool for industry and research and are often a significant component of data centres. As a result of this criticality, efficient execution of DBMS engines has become an important area of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

VEE '06: Proceedings of the 2nd international conference on Virtual execution environments

June 2006

194 pages

ISBN:1595933328

DOI:10.1145/1134760

General Chair:
Hans-J. Boehm
HP Labs, USA
,
Program Chair:
David Grove
IBM Research, USA

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

VEE06

Sponsor:

VEE06: Second International Conference on Virtual Execution Environments

June 14 - 16, 2006

Ontario, Ottawa, Canada

Acceptance Rates

Overall Acceptance Rate 80 of 235 submissions, 34%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

23
Total Citations
View Citations
426
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Johnson EDharsee KCriswell JSartor JNaik MRossbach C(2019)Secure guest virtual machine support in apparitionProceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3313808.3313809(17-30)Online publication date: 14-Apr-2019
https://dl.acm.org/doi/10.1145/3313808.3313809
Şuşu A(2019)Compiling Efficiently with Arithmetic Emulation for the Custom-Width Connex Vector ProcessorProceedings of the 5th Workshop on Programming Models for SIMD/Vector Processing10.1145/3303117.3306166(1-8)Online publication date: 16-Feb-2019
https://dl.acm.org/doi/10.1145/3303117.3306166
Bo PPottmann HKilian MWang WWallner J(2018)Circular arc structuresACM Transactions on Graphics (TOG)10.1145/2010324.196499630:4(1-12)Online publication date: 23-Sep-2018
https://dl.acm.org/doi/10.1145/2010324.1964996
Carroll RRamamoorthi RAgrawala M(2018)Illumination decomposition for material recoloring with consistent interreflectionsACM Transactions on Graphics (TOG)10.1145/2010324.196493830:4(1-10)Online publication date: 23-Sep-2018
https://dl.acm.org/doi/10.1145/2010324.1964938
Kirk AO'Brien J(2018)Perceptually based tone mapping for low-light conditionsACM Transactions on Graphics (TOG)10.1145/2010324.196493730:4(1-10)Online publication date: 23-Sep-2018
https://dl.acm.org/doi/10.1145/2010324.1964937
Tocci MKiser CTocci NSen P(2018)A versatile HDR video production systemACM Transactions on Graphics (TOG)10.1145/2010324.196493630:4(1-10)Online publication date: 23-Sep-2018
https://dl.acm.org/doi/10.1145/2010324.1964936
Xu SGregg D(2015)Efficient Exploitation of Hyper Loop Parallelism in VectorizationLanguages and Compilers for Parallel Computing10.1007/978-3-319-17473-0_25(382-396)Online publication date: 1-May-2015
https://doi.org/10.1007/978-3-319-17473-0_25
Ahn JSon YKim J(2013)Scalable high-radix router microarchitecture using a network switch organizationACM Transactions on Architecture and Code Optimization10.1145/251243310:3(1-25)Online publication date: 16-Sep-2013
https://dl.acm.org/doi/10.1145/2512433
Shobaki GShawabkeh MRmaileh N(2013)Preallocation instruction scheduling with register pressure minimization using a combinatorial optimization approachACM Transactions on Architecture and Code Optimization10.1145/251243210:3(1-31)Online publication date: 16-Sep-2013
https://dl.acm.org/doi/10.1145/2512432
Bakhoda AKim JAamodt T(2013)Designing on-chip networks for throughput acceleratorsACM Transactions on Architecture and Code Optimization10.1145/251242910:3(1-35)Online publication date: 16-Sep-2013
https://dl.acm.org/doi/10.1145/2512429
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten