ACM Home Page
Please provide us with feedback. Feedback
Efficiently decodable and searchable natural language adaptive compression
Full text PdfPdf (233 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Salvador, Brazil
SESSION: Efficiency table of contents
Pages: 234 - 241  
Year of Publication: 2005
ISBN:1-59593-034-5
Authors
Nieves R. Brisaboa  University da Coruña, A Coruña, Spain
Antonio Fariña  University da Coruña, A Coruña, Spain
Gonzalo Navarro  University of Chile, Santiago, Chile
José R. Paramá  University da Coruña, Coruña, Spain
Sponsor
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 76,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1076034.1076076
What is a DOI?

ABSTRACT

We address the problem of adaptive compression of natural language text, focusing on the case where low bandwidth is available and the receiver has little processing power, as in mobile applications. Our technique achieves compression ratios around 32% and requires very little effort from the receiver. This tradeoff, not previously achieved with alternative techniques, is obtained by breaking the usual symmetry between sender and receiver dominant in statistical adaptive compression. Moreover, we show that our technique can be adapted to avoid decompression at all in cases where the receiver only wants to detect the presence of some keywords in the document. This is useful in scenarios such as selective dissemination of information, news clipping, alert systems, text categorization, and clustering. Thanks to the asymmetry we introduce, the receiver can search the compressed text much faster than the plain text. This was previously achieved only in semistatic compression scenarios.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
3
 
4
N. Brisaboa, A. Fariña, G. Navarro, and M. Esteller. (s,c)-dense coding: An optimized compression code for natural language text databases. Proc. 10th SPIRE, LNCS 2857, pp. 122--136, 2003.
 
5
N. Brisaboa, A. Fariña, G. Navarro, and J. Paramá Simple, fast, and efficient natural language adaptive compression. Proc. 11th SPIRE, LNCS 3246, pp. 230--241, 2004.
 
6
N. Brisaboa, E. Iglesias, G. Navarro, and J. Paramá An efficient compression code for text databases. Proc 25th ECIR, LNCS 2633, pp. 468--481, 2003.
 
7
M. Burrows and D. Wheeler. A block-sorting lossless data compression algorithm. TR 124, DEC, 1994.
 
8
J. Carpinelli, A. Moffat, R. Neal, W. Salamonsen, L. Stuiver, A. Turpin, and I. Witten. Word, character, integer, and bit based compression using arithmetic coding. www.cs.mu.oz.au/~alistair/arith_coder, 1999.
9
10
 
11
A. Fariña. New Compression Codes for Text Databases. PhD thesis, Univ. A Coruñ a, Comp. Sci. Dept., A Coruñ a, Spain, 2005. To appear.
 
12
 
13
R. N. Horspool. Practical fast searching in strings. Soft. Pract. Exp., 10(6):501--506, 1980.
 
14
D. A. Huffman. A method for the construction of minimum-redundancy codes. Proc. Inst. Radio Eng., 40(9):1098--1101, 1952.
 
15
 
16
 
17
 
18
 
19
 
20
A. Turpin and A. Moffat. Fast file search using text compression. Proc. 20th Australian Comp. Sci. Conf., pp. 1--8, 1997.
 
21
 
22
G. K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, 1949.
 
23
J. Ziv and A. Lempel. A universal algorithm for sequential data compression. IEEE TIT, 23(3):337--343, 1977.
 
24
J. Ziv and A. Lempel. Compression of individual sequences via variable-rate coding. IEEE TIT, 24(5):530--536, 1978.
 
25

Collaborative Colleagues:
Nieves R. Brisaboa: colleagues
Antonio Fariña: colleagues
Gonzalo Navarro: colleagues
José R. Paramá: colleagues