ABSTRACT
We present Vector LLVA, a virtual instruction set architecture (VISA) that exposes extensive static information about vector parallelism while avoiding the use of hardware-specific parameters. We provide both arbitrary-length vectors (for targets that allow vectors of arbitrary length, or where the target length is not known) and fixed-length vectors (for targets that have a fixed vector length, such as subword SIMD extensions), together with a rich set of operations on both vector types. We have implemented translators that compile (1) Vector LLVA written with arbitrary-length vectors to the Motorola RSVP architecture and (2) Vector LLVA written with fixed-length vectors to both AltiVec and Intel SSE2. Our translatorgenerated code achieves speedups competitive with handwritten native code versions of several benchmarks on all three architectures. These experiments show that our V-ISA design captures vector parallelism for two quite different classes of architectures and provides virtual object code portability within the class of subword SIMD architectures.
- V. Adve, C. Lattner, M. Brukman, A. Shukla, and B. Gaeke. LLVA: A Low-Level Virtual Instruction Set Architecture. In Proc. ACM/IEEE Int'l Symp. on Microarchitecture (MICRO), pages 205--216, San Diego, CA, Dec. 2003. Google ScholarDigital Library
- R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures. Morgan Kaufmann Publishers, Inc., San Francisco, CA, 2002. Google ScholarDigital Library
- Apple Computer, Inc. AltiVec/SSE Migration Guide. http://developer.apple.com/documentation/Performance/VelocityEngine-date.html, 2005.Google Scholar
- L. Baumstark, Jr., and L. Wills. Exposing Data-Level Parallelism in Sequential Image Processing Algorithms. In Proc. Working Conf. on Reverse Engineering (WCRE), 2002. Google ScholarDigital Library
- A. J. Bik. The Software Vectorization Handbook: Applying Multimedia Extensions for Maximum Performance. Intel Press, 2004. Google ScholarDigital Library
- G. E. Blelloch and S. Chatterjee. VCODE: A Data-Parallel Intermediate Language. In Proc. Symp. on the Frontiers of Massively Parallel Computation, pages 471--480, Oct. 1990.Google ScholarCross Ref
- G. Cheong and M. Lam. An Optimizer for Multimedia Instruction Sets. In Proc. Second SUIF Compiler Workshop, 1997.Google Scholar
- S. Ciricescu, R. Essick, B. Lucas, P. May, K. Moat, J. Norris, M. Schuette, and A. Saidi. The Reconfigurable Streaming Vector Processor (RSVP). In Proc. ACM/IEEE Int'l Symp. on Microarchitecture (MICRO). IEEE Computer Society, Dec. 2003. Google ScholarDigital Library
- K. Diefendorff, P. K. Dubey, R. Hochsprung, and H. Scales. AltiVec Extension to PowerPC Accelerates Media Processing. In Proc. ACM/IEEE Int'l Symp. on Microarchitecture (MICRO), 2000. Google ScholarDigital Library
- A. Eichenberger, P. Wu, and K. O'Brien. Vectorization for SIMD Architectures with Alignment Constraints. In Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI), 2004. Google ScholarDigital Library
- R. Fisher and H. Dietz. Compiling for SIMD Within a Register. In Proc. Int'l Workshop on Languages and Compilers for Parallel Computing (LCPC), 1998. Google ScholarDigital Library
- J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy. Introduction to the Cell Multiprocessor. IBM Journal of Research and Development, 49(4/5):589--604, 2005. Google ScholarDigital Library
- U. J. Kapasi, S. Rixner, W. J. Dally, B. Khailany, J. H. Ahn, P. Mattson, and J. D. Owens. Programmable Stream Processors. IEEE Computer, pages 54--62, Aug. 2003. Google ScholarDigital Library
- A. Kudriavtsev and P. Kogge. Generation of Permutations for SIMD Processors. In Conf. on Language, Compiler, and Tool Support for Embedded Systems (LCTES), 2005. Google ScholarDigital Library
- F. Labonte, P. Mattson, I. Buck, C. Kozyrakis, and M. Horowitz. The Stream Virtual Machine. In Proc. Int'l Conf. on Parallel Architectures and Compilation Techniques (PACT), 2004. Google ScholarDigital Library
- S. Larsen and S. Amarasinghe. Exploiting Superword Level Parallelism with Multimedia Instruction Sets. In Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI), 2000. Google ScholarDigital Library
- S. Larsen, E. Witchel, and S. Amarasinghe. Increasing and Detecting Memory Address Congruence. In Proc. Int'l Conf. on Parallel Architectures and Compilation Techniques (PACT), 2002. Google ScholarDigital Library
- C. Lattner and V. Adve. LLVM: A Compilation Framework for Lifelong Program Analysis and Transformation. In Proc. Int'l Symp. on Code Generation and Optimization (CGO), San Jose, Mar 2004. Google ScholarDigital Library
- T. Lindholm and F. Yellin. The Java Virtual Machine Specification. Addison-Wesley, Reading, MA, 1997. Google ScholarDigital Library
- P. R. Mattson. A Programming System for the Imagine Media Processor. PhD thesis, Computer Science Dept., Stanford University, 2002. Google ScholarDigital Library
- E. Meijer and J. Gough. A Technical Overview of the Common Language Infrastructure. http://research.microsoft.com/ meijer, 2002.Google Scholar
- G. Ren, P. Wu, and D. Padua. An Empirical Study on the Vectorization of Multimedia Applications for Multimedia Extensions. In Proc. Int'l Parallel and Distributed Processing Symp., 2005. Google ScholarDigital Library
- B. Serebrin, J. D. Owens, C. H. Chen, S. P. Crago, U. J. Kapasi, B. Khailany, P. Mattson, J. Namkoong, S. Rixner, and W. J. Dally. A Stream Processor Development Platform. In Proc. Int'l Conf. on Computer Design (CDES), 2002. Google ScholarDigital Library
- J. Shin, J. Chame, and M. Hall. Exploiting Superword-Level Locality in Multimedia Extension Architectures. Journal of Instruction-Level Parallelism, 31(5):1--28, 2003.Google Scholar
- W. Thies, M. Karczmarek, and S. Amarasinghe. StreamIt: A Language for Streaming Applications. In Proc. Int'l Conf. on Compiler Construction (CC), 2002. Google ScholarDigital Library
- P. Wu, A. Eichenberger, and A. Wang. Efficient SIMD Code Generation for Runtime Alignment and Length Conversion. In Proc. Int'l Symp. on Code Generation and Optimization (CGO), 2005. Google ScholarDigital Library
- P. Wu, A. Eichenberger, A. Wang, and P. Zhao. An Integrated Simdization Framework Using Virtual Vectors. In Proc. Int'l Conf. on Supercomputing (ICS), 2005. Google ScholarDigital Library
- J. Xiong, J. Johnson, R. Johnson, and D. Padua. SPL: A Language and Compiler for DSP Algorithms. In Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI), 2001. Google ScholarDigital Library
Index Terms
- Vector LLVA: a virtual vector instruction set for media processing
Recommendations
ALP: Efficient support for all levels of parallelism for complex media applications
The real-time execution of contemporary complex media applications requires energy-efficient processing capabilities beyond those of current superscalar processors. We observe that the complexity of contemporary media applications requires support for ...
Efficient multimedia coprocessor with enhanced SIMD engines for exploiting ILP and DLP
Multimedia applications have become increasingly important in daily computing. These applications are composed of heterogeneous regions of code mixed with data-level parallelism (DLP) and instruction-level parallelism (ILP). A standard solution for a ...
Vector Extensions for Decision Support DBMS Acceleration
MICRO-45: Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on MicroarchitectureDatabase management systems (DBMS) have become an essential tool for industry and research and are often a significant component of data centres. As a result of this criticality, efficient execution of DBMS engines has become an important area of ...
Comments