ACM Home Page
Please provide us with feedback. Feedback
Improving the memory behavior of vertical filtering in the discrete wavelet transform
Full text PdfPdf (910 KB)
Source Conference On Computing Frontiers archive
Proceedings of the 3rd conference on Computing frontiers table of contents
Ischia, Italy
SESSION: Applications I table of contents
Pages: 253 - 260  
Year of Publication: 2006
ISBN:1-59593-302-6
Authors
Asadollah Shahbahrami  Delft University of Technology, The Netherlands
Ben Juurlink  Delft University of Technology, The Netherlands
Stamatis Vassiliadis  Delft University of Technology, The Netherlands
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 39,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1128022.1128056
What is a DOI?

ABSTRACT

The discrete wavelet transform (DWT) is used in several image and video compression standards, in particular JPEG2000. A 2D DWT consists of horizontal filtering along the rows followed by vertical filtering along the columns. It is well-known that a straightforward implementation of vertical filtering (assuming a row-major layout) induces many cache misses, due to lack of spatial locality. This can be avoided by interchanging the loops. This paper shows, however, that the resulting implementation suffers significantly from 64K aliasing, which occurs in the Pentium 4 when two data blocks are accessed that are a multiple of 64K apart, and we propose two techniques to avoid it. In addition, if the filter length is longer than four, the number of ways of the L1 data cache of the Pentium 4 is insufficient to avoid cache conflict misses. Consequently, we propose two methods for reducing conflict misses. Although experimental results have been collected on the Pentium 4, the techniques are general and can be applied to other processors with different cache organizations as well. The proposed techniques improve the performance of vertical filtering compared to already optimized baseline implementations by a factor of 3.11 for the (5,3) lifting scheme, 3.11 for Daubechies' transform of four coefficients, and by a factor of 1.99 for the Cohen, Daubechies, and Feauveau 9/7 transform.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
G. Bernabe, J. M. Garcia, and J. Gonzales. Reducing 3D Wavelet Transform Execution Time Through the Streaming SIMD Extensions. In Proc. 11th Euromicro Conf. on Parallel Distributed and Network based Processing, February 2003.
 
2
R. E. Bryant and D. R. O'Hallaron. Computer Systems: A Programmer's Perspective. Prentice Hall, 2003.
 
3
S. Chatterjee and C. D. Brooks. Cache-Efficient Wavelet Lifting in JPEG 2000. In Proc. IEEE Int. Conf. on Multimedia, pages 797--800, August 2002.
 
4
 
5
 
6
B. D. Choi, K. S. Choi, M. C. Hwang, J. K. Cho, and S. J. Ko. Real-time DSP Implementation of Motion-JPEG2000 Using Overlapped Block Transferring and Parallel-Pass Methods. Real-Time Imaging, 10:277--284, 2004.
 
7
C. Chrysafis and A. Ortega. Line-Based, Reduced Memory, Wavelet Image Compression. IEEE Trans. on Image Processing, 9(3):378--389, March 2000.
 
8
A. Cohen, I. Daubechies, and J. C. F. Eauveau. Biorthogonal Bases of Compactly Supported Wavelets. Communications on Pure and Appl. Math., 45(5):485--560, June 1992.
 
9
I. Daubechies and W. Sweldens. Factoring Wavelet Transforms into Lifting Steps. Journal of Fourier Analysis and Applications, 4(3):247--269, 1998.
 
10
D. He and W. Zhang. The Parallel Algorithm of 2-D Discrete Wavelet Transform. In Proc. 4th IEEE Int. Conf. on Parallel and Distributed Computing Applications and Technologies, pages 738--741, August 2003.
 
11
Intel Corporation. IA-32 Intel Architecture Optimization, 2004. Order Number: 248966-011.
 
12
Intel Corporation. The IA-32 Intel Architecture Software Developer's Manual Volume 3 System Programming Guide, 2004. Order Number: 253668.
 
13
 
14
P. Meerwald, R. Norcen, and A. Uhl. Cache Issues with JPEG2000 Wavelet Lifting. In Proc. of Visual Communications and Image Processing, January 2002.
 
15
M. Rabbani and R. Joshi. An Overview of the JPEG2000 Still Image Compression Standard. Signal Processing: Image Communication, 17(1):3--48, January 2002.
 
16
J. A. Shafer. Embedded Vector Processor Architecture for Real-Time Wavelet Video Compression. Master's thesis, Department of Electrical and Computer Eng. University of Dayton, 2004.
 
17
 
18
A. N. Skodras, C. A. Christopoulos, and T. Ebrahimi. JPEG 2000: The Upcoming Still Image Compression Standard. In Proc. 11th Portugues Conf. on Pattern Recongnition, pages 359--366, May 2000.
 
19
D. B. Stewart. Measuring Execution Time and Real-Time Performance. In Embedded Systems Conf., pages 1--15, April 2001.
 
20
 
21
W. Sweldens. The Lifting Scheme: A Custom-Design Construction of Biorthogonal Wavelets. Journal of Applied and Computational Harmonic Analysis, 3(2):186--200, 1996.
 
22
M. A. Trenas, J. Lopez, E. L. Zapata, and F. Arguello. A Memory System Supporting the Efficient SIMD Computation of the Two Dimensional DWT. In IEEE Int. Conf. on Acoustics Speech and Signal Processing, volume 3, pages 1521--1524, May 1998.

Collaborative Colleagues:
Asadollah Shahbahrami: colleagues
Ben Juurlink: colleagues
Stamatis Vassiliadis: colleagues