ACM Home Page
Please provide us with feedback. Feedback
Compressing and searching XML data via two zips
Full text PdfPdf (314 KB)
Source International World Wide Web Conference archive
Proceedings of the 15th international conference on World Wide Web table of contents
Edinburgh, Scotland
SESSION: XML table of contents
Pages: 751 - 760  
Year of Publication: 2006
ISBN:1-59593-323-9
Authors
P. Ferragina  Univ. Pisa
F. Luccio  Univ. Pisa
G. Manzini  Univ. Piemonte Orientale
S. Muthukrishnan  Rutgers Univ.
Sponsors
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 16,   Downloads (12 Months): 118,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1135777.1135891
What is a DOI?

ABSTRACT

XML is fast becoming the standard format to store, exchange and publish over the web, and is getting embedded in applications. Two challenges in handling XML are its size (the XML representation of a document is significantly larger than its native state) and the complexity of its search (XML search involves path and content searches on labeled tree structures). We address the basic problems of compression, navigation and searching of XML documents. In particular, we adopt recently proposed theoretical algorithms [11] for succinct tree representations to design and implement a compressed index for XML, called XBZIPiNDEX, in which the XML document is maintained in a highly compressed format, and both navigation and searching can be done uncompressing only a tiny fraction of the data. This solution relies on compressing and indexing two arrays derived from the XML data. With detailed experiments we compare this with other compressed XML indexing and searching engines to show that XBZIPiNDEX has compression ratio up to 35% better than the ones achievable by those other tools, and its time performance on some path and content search operations is order of magnitudes faster: few milliseconds over hundreds of MBs of XML files versus tens of seconds, on standard XML data sources.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
 
4
A. Arion, A. Bonifati, G. Costa, S. D'Aguanno, I. Manolescu, and A. Pugliese. XQueC: pushing queries to compressed XML data. In VLDB, 2003.
 
5
 
6
7
 
8
 
9
J. Cheney. An empirical evaluation of simple DTD-conscious compression techniques. In WebDB, 2005.
 
10
J. Cheng and W. Ng. XQzip: Querying compressed XML using structural indexing. In International Conference on Extending Database Technology, pages 219--236, 2004.
 
11
 
12
13
 
14
 
15
 
16
17
18
19
 
20
W. Y. Lam, W. Ng, P. T. Wood, and M. Levene. XCQ: XML compression and querying system. In WWW, 2003.
21
 
22
23
24
 
25
 
26
 
27
28
 
29


Collaborative Colleagues:
P. Ferragina: colleagues
F. Luccio: colleagues
G. Manzini: colleagues
S. Muthukrishnan: colleagues