|
ABSTRACT
We present Vector LLVA, a virtual instruction set architecture (VISA) that exposes extensive static information about vector parallelism while avoiding the use of hardware-specific parameters. We provide both arbitrary-length vectors (for targets that allow vectors of arbitrary length, or where the target length is not known) and fixed-length vectors (for targets that have a fixed vector length, such as subword SIMD extensions), together with a rich set of operations on both vector types. We have implemented translators that compile (1) Vector LLVA written with arbitrary-length vectors to the Motorola RSVP architecture and (2) Vector LLVA written with fixed-length vectors to both AltiVec and Intel SSE2. Our translatorgenerated code achieves speedups competitive with handwritten native code versions of several benchmarks on all three architectures. These experiments show that our V-ISA design captures vector parallelism for two quite different classes of architectures and provides virtual object code portability within the class of subword SIMD architectures.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
V. Adve, C. Lattner, M. Brukman, A. Shukla, and B. Gaeke. LLVA: A Low-Level Virtual Instruction Set Architecture. In Proc. ACM/IEEE Int'l Symp. on Microarchitecture (MICRO), pages 205--216, San Diego, CA, Dec. 2003.
|
| |
2
|
R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures. Morgan Kaufmann Publishers, Inc., San Francisco, CA, 2002.
|
| |
3
|
Apple Computer, Inc. AltiVec/SSE Migration Guide. http://developer.apple.com/documentation/Performance/VelocityEngine-date.html, 2005.
|
| |
4
|
L. Baumstark, Jr., and L. Wills. Exposing Data-Level Parallelism in Sequential Image Processing Algorithms. In Proc. Working Conf. on Reverse Engineering (WCRE), 2002.
|
| |
5
|
A. J. Bik. The Software Vectorization Handbook: Applying Multimedia Extensions for Maximum Performance. Intel Press, 2004.
|
| |
6
|
G. E. Blelloch and S. Chatterjee. VCODE: A Data-Parallel Intermediate Language. In Proc. Symp. on the Frontiers of Massively Parallel Computation, pages 471--480, Oct. 1990.
|
| |
7
|
G. Cheong and M. Lam. An Optimizer for Multimedia Instruction Sets. In Proc. Second SUIF Compiler Workshop, 1997.
|
| |
8
|
S. Ciricescu, R. Essick, B. Lucas, P. May, K. Moat, J. Norris, M. Schuette, and A. Saidi. The Reconfigurable Streaming Vector Processor (RSVP). In Proc. ACM/IEEE Int'l Symp. on Microarchitecture (MICRO). IEEE Computer Society, Dec. 2003.
|
| |
9
|
K. Diefendorff, P. K. Dubey, R. Hochsprung, and H. Scales. AltiVec Extension to PowerPC Accelerates Media Processing. In Proc. ACM/IEEE Int'l Symp. on Microarchitecture (MICRO), 2000.
|
| |
10
|
A. Eichenberger, P. Wu, and K. O'Brien. Vectorization for SIMD Architectures with Alignment Constraints. In Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI), 2004.
|
| |
11
|
R. Fisher and H. Dietz. Compiling for SIMD Within a Register. In Proc. Int'l Workshop on Languages and Compilers for Parallel Computing (LCPC), 1998.
|
| |
12
|
J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy. Introduction to the Cell Multiprocessor. IBM Journal of Research and Development, 49(4/5):589--604, 2005.
|
| |
13
|
U. J. Kapasi, S. Rixner, W. J. Dally, B. Khailany, J. H. Ahn, P. Mattson, and J. D. Owens. Programmable Stream Processors. IEEE Computer, pages 54--62, Aug. 2003.
|
| |
14
|
A. Kudriavtsev and P. Kogge. Generation of Permutations for SIMD Processors. In Conf. on Language, Compiler, and Tool Support for Embedded Systems (LCTES), 2005.
|
| |
15
|
F. Labonte, P. Mattson, I. Buck, C. Kozyrakis, and M. Horowitz. The Stream Virtual Machine. In Proc. Int'l Conf. on Parallel Architectures and Compilation Techniques (PACT), 2004.
|
| |
16
|
S. Larsen and S. Amarasinghe. Exploiting Superword Level Parallelism with Multimedia Instruction Sets. In Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI), 2000.
|
| |
17
|
S. Larsen, E. Witchel, and S. Amarasinghe. Increasing and Detecting Memory Address Congruence. In Proc. Int'l Conf. on Parallel Architectures and Compilation Techniques (PACT), 2002.
|
| |
18
|
C. Lattner and V. Adve. LLVM: A Compilation Framework for Lifelong Program Analysis and Transformation. In Proc. Int'l Symp. on Code Generation and Optimization (CGO), San Jose, Mar 2004.
|
| |
19
|
T. Lindholm and F. Yellin. The Java Virtual Machine Specification. Addison-Wesley, Reading, MA, 1997.
|
| |
20
|
P. R. Mattson. A Programming System for the Imagine Media Processor. PhD thesis, Computer Science Dept., Stanford University, 2002.
|
| |
21
|
E. Meijer and J. Gough. A Technical Overview of the Common Language Infrastructure. http://research.microsoft.com/ meijer, 2002.
|
| |
22
|
G. Ren, P. Wu, and D. Padua. An Empirical Study on the Vectorization of Multimedia Applications for Multimedia Extensions. In Proc. Int'l Parallel and Distributed Processing Symp., 2005.
|
| |
23
|
B. Serebrin, J. D. Owens, C. H. Chen, S. P. Crago, U. J. Kapasi, B. Khailany, P. Mattson, J. Namkoong, S. Rixner, and W. J. Dally. A Stream Processor Development Platform. In Proc. Int'l Conf. on Computer Design (CDES), 2002.
|
| |
24
|
J. Shin, J. Chame, and M. Hall. Exploiting Superword-Level Locality in Multimedia Extension Architectures. Journal of Instruction-Level Parallelism, 31(5):1--28, 2003.
|
| |
25
|
W. Thies, M. Karczmarek, and S. Amarasinghe. StreamIt: A Language for Streaming Applications. In Proc. Int'l Conf. on Compiler Construction (CC), 2002.
|
| |
26
|
P. Wu, A. Eichenberger, and A. Wang. Efficient SIMD Code Generation for Runtime Alignment and Length Conversion. In Proc. Int'l Symp. on Code Generation and Optimization (CGO), 2005.
|
| |
27
|
P. Wu, A. Eichenberger, A. Wang, and P. Zhao. An Integrated Simdization Framework Using Virtual Vectors. In Proc. Int'l Conf. on Supercomputing (ICS), 2005.
|
| |
28
|
J. Xiong, J. Johnson, R. Johnson, and D. Padua. SPL: A Language and Compiler for DSP Algorithms. In Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI), 2001.
|
|