Abstract
Parallel bit stream algorithms exploit the SWAR (SIMD within a register) capabilities of commodity processors in high-performance text processing applications such as UTF-8 to UTF-16 transcoding, XML parsing, string search and regular expression matching. Direct architectural support for these algorithms in future SWAR instruction sets could further increase performance as well as simplifying the programming task. A set of simple SWAR instruction set extensions are proposed for this purpose based on the principle of systematic support for inductive doubling as an algorithmic technique. These extensions are shown to significantly reduce instruction count in core parallel bit stream algorithms, often providing a 3X or better improvement. The extensions are also shown to be useful for SWAR programming in other application areas, including providing a systematic treatment for horizontal operations. An implementation model for these extensions involves relatively simple circuitry added to the operand fetch components in a pipelined processor.
- Krste Asanovic, Ras Bodik, Bryan Christopher Catanzaro, Joseph James Gebis, Parry Husbands, Kurt Keutzer, David A. Patterson, William Lester Plishker, John Shalf, Samuel Webb Williams, and Katherine A. Yelick. The landscape of parallel computing research: A view from berkeley. Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley, Dec 2006.Google Scholar
- Robert D. Cameron. u8u16 -- a high-speed UTF-8 to UTF-16 transcoder using parallel bit streams. Technical Report TR 2007-18, Simon Fraser University, Burnaby, BC, Canada, 2007.Google Scholar
- Robert D. Cameron. A case study in SIMD text processing with parallel bit streams. In ACM Symposium on Principles and Practice of Parallel Programming (PPoPP), Salt Lake City, Utah, 2008. Google ScholarDigital Library
- Robert D. Cameron, Kenneth S. Herdy, and Dan Lin. High performance XML parsing using parallel bit stream technology. In CASCON '08: Proceedings of the 2008 conference of the Centre for Advanced Studies on Collaborative research, Toronto, Ontario, Canada, 2008. Google ScholarDigital Library
- James R. Green, Hanan Mahmoud, and Michel Dumontier. Towards real time protein identification using the Cell BE. In Workshop on Cell BE and Heterogeneous Multicore Systems: Architectures and Applications at CASCON '08, Toronto, Ontario, Canada, 2008.Google Scholar
- Kenneth S. Herdy, David S. Burggraf, and Robert D. Cameron. High performance GML to SVG transformation for the visual presentation of geographic data in web-based mapping systems. In Proceedings of SVG Open 2008, Nuremburg, Germany, 2008.Google Scholar
- Donald E. Knuth. The Art of Computer Programming Volume 4 Pre-Fascicle 1A: Bitwise Tricks and Techniques. Draft of 22 December 2008, Stanford University.Google Scholar
- R. Raman and D.S. Wise. Converting to and from dilated integers. IEEE Transactions on Computers, 57(4):567--573, 2008. Google ScholarDigital Library
- Chester Rebeiro, A. David Selvakumar, and A. S. L. Devi. Bitslice implementation of AES. In David Pointcheval, Yi Mu, and Kefei Chen, editors, CANS, volume 4301 of Lecture Notes in Computer Science, pages 203--212. Springer, 2006. Google ScholarDigital Library
- Leo Stocco and Günther Schrack. Integer dilation and contraction for quadtrees and octrees. In Proceedings of the IEEE Pacific Rim Conference on Communications, Computers, and Signal Processing, pages 426--428, Victoria, B.C., 1995.Google ScholarCross Ref
- Henry S. Warren. Hacker's Delight. Addison-Wesley, 2002. Google ScholarDigital Library
Index Terms
- Architectural support for SWAR text processing with parallel bit streams: the inductive doubling principle
Recommendations
Architectural support for SWAR text processing with parallel bit streams: the inductive doubling principle
ASPLOS XIV: Proceedings of the 14th international conference on Architectural support for programming languages and operating systemsParallel bit stream algorithms exploit the SWAR (SIMD within a register) capabilities of commodity processors in high-performance text processing applications such as UTF-8 to UTF-16 transcoding, XML parsing, string search and regular expression ...
Architectural support for SWAR text processing with parallel bit streams: the inductive doubling principle
ASPLOS 2009Parallel bit stream algorithms exploit the SWAR (SIMD within a register) capabilities of commodity processors in high-performance text processing applications such as UTF-8 to UTF-16 transcoding, XML parsing, string search and regular expression ...
A case study in SIMD text processing with parallel bit streams: UTF-8 to UTF-16 transcoding
PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programmingHigh performance SIMD text processing using the method of parallel bit streams is introduced with a case study of UTF-8 to UTF-16 transcoding. A forward transform converts byte-oriented character stream data into eight parallel bit streams. Decoding, ...
Comments