ACM Home Page
Please provide us with feedback. Feedback
Generation of permutations for SIMD processors
Full text PdfPdf (356 KB)
Source ACM SIGPLAN Notices archive
Volume 40 ,  Issue 7  (July 2005) table of contents
Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
SESSION: Hardware supported optimization table of contents
Pages: 147 - 156  
Year of Publication: 2005
ISSN:0362-1340
Also published in ...
Authors
Alexei Kudriavtsev  University of Notre Dame
Peter Kogge  University of Notre Dame
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 106,   Citation Count: 8
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1070891.1065931
What is a DOI?

ABSTRACT

Short vector (SIMD) instructions are useful in signal processing, multimedia, and scientific applications. They offer higher performance, lower energy consumption, and better resource utilization. However, compilers still do not have good support for SIMD instructions, and often the code has to be written manually in assembly language or using compiler builtin functions. Also, in some applications, higher parallelism could be achieved if compilers inserted permutation instructions that reorder the data in registers. In this paper we describe how we create SIMD instructions from regular code, and determine ordering of individual operations in the SIMD instructions to minimize the number of permutation instructions. Individual memory operations are grouped into SIMD operations based on their effective addresses. The SIMD data flow graph is then constructed by following data dependences from SIMD memory operations. Then, the orderings of operations are propagated from SIMD memory operations into the graph.We also describe our approach to compute decomposition of a given permutation into the permutation instructions of the target architecture. Experiments with our prototype compiler show that this approach scales well with the number of operations in SIMD instructions (SIMD width) and can be used to compile a number of important kernels, achieving up to 35% speedup.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
 
4
Intel Corporation. Intel® C++ Compiler for Linux* Systems User's Guide, 2003.
5
 
6
 
7
8
 
9
10

CITED BY  8
 
 

Collaborative Colleagues:
Alexei Kudriavtsev: colleagues
Peter Kogge: colleagues