ABSTRACT
We describe a tool for compressing XML data, with applications in data exchange and archiving, which usually achieves about twice the compression ratio of gzip at roughly the same speed. The compressor, called XMill, incorporates and combines existing compressors in order to apply them to heterogeneous XML data: it uses zlib, the library function for gzip, a collection of datatype specific compressors for simple data types, and, possibly, user defined compressors for application specific data types.
- 1.D. Belanger and K. Church. Data flows with examples from telecommunications. In Proceedings of 1999 Workshop on Databases in Telecommunication, Edinburgh, UK, September 1999.Google Scholar
- 2.T.C. Bell, J.G. Cleary, and I.H. Witten. Text Compression. Prentice Hall, Englewood Cliffs, New Jersey, 1990. Google ScholarDigital Library
- 3.M. Burrows and D. J. Wheeler. A block-sorting lossless data compression algorithm. Technical report, Digital Equipment Corporation, May 1994.Google Scholar
- 4.Clark and S. DeRose. XML path language (XPath), version 1.0. W3C Working Draft, August 1999. Available as http ://www. w3. org/TR/xpath.Google Scholar
- 5.R. Goldman and J. Widom. DataGuides: enabling query formulation and optimization in semistructured databases. In Proceedings of the International Conference on Very Large Data Bases, pages 436-445, Athens, Greece, August 1997. Google ScholarDigital Library
- 6.J. Goldstein, R. Ramakrishnan, and U. Shaft. Compressing relations and indexes. In Proc. IEEE Conf on Data Engineering, 1998. Google ScholarDigital Library
- 7.S. Grumbach and F. Tahi. A new challenge for compression algorithms: genetic sequences. Information Processing and Management, 30(6):875-886, 1994. Google ScholarDigital Library
- 8.D. G. Higgins, R. Fuchs, P. J. Stoehr, and G. N. Cameron. The EMBL data library. Nucleic Acids Research, 20:2071- 2074, 1992.Google ScholarCross Ref
- 9.J. Hopcroft and J. Ullman. Introduction to automata theory, languages, and computation. Addison-Wesley, 1979. Google ScholarDigital Library
- 10.B.R. Iyer and D. Wilhite. Data compression support in databases. In VLDB'9#, Proceedings of 20th International Conference on Very Large Data Bases, pages 695-704, Santiago de Chile, Chile, September 1994. Google ScholarDigital Library
- 11.H. Liefke and S.B. Davidson. An extensible compressor for XML data. SIGMOD Record, 29(1), March 2000. Google ScholarDigital Library
- 12.H. Liefke and D. Suciu. XMill: An efficient compressor for XML data. Technical Report MS-CIS-98-06, Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104, October 1999.Google Scholar
- 13.M.P. Marcus, B. Santorini, and M. Marcinkiewicz. Building a large annotated corpus of english: the penn treebank. Computational Linguistics, 19, 1993. Google ScholarDigital Library
- 14.S. Nestorov, S. Abiteboul, and R. Motwani. Inferring structure in semistructured data. In Proceedings of the Workshop on Management of Semi-structured Data, 1997. Available from http ://www. research, att. com/~ suc iu/workshop-papers, html.Google ScholarDigital Library
- 15.W.K. Ng and C.V. Ravishankar. Block-oriented compression techniques for large statistical databases. TKDE, 9(2):314- 328, 1997. Google ScholarDigital Library
- 16.M. A. Roth and S. Van Horn. Database compression. ACM SIGMOD Record, 22(3):31-39, Sept. 1993. Google ScholarDigital Library
- 17.D. Salomon. Data Compression. The Complete Reference. Springer, New York, 1997. Google ScholarDigital Library
- 18.C.E. Shannon. A mathematica theory of communication. Bell System Technical Journal, 27:379-423 and 623-656, July and October 1948. Also available in Claude Elwood Shannon, Collected Papers, N.J.A.Sloane and A.D.Wyner eds, IEEE Press, 1993. Google ScholarDigital Library
- 19.H.S. Thompson, D. Beech, M. Maloney, and N. Mendelsohn. XML schema part 1: Structures. 1/113C Working Draft, September 1999. Available as http://www, w3. org/TR/xmls chema-I.Google Scholar
- 20.J. Ziv and A. Lempel. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, 23(3):337-343, 1977.Google ScholarDigital Library
Index Terms
- XMill: an efficient compressor for XML data
Recommendations
XMill: an efficient compressor for XML data
We describe a tool for compressing XML data, with applications in data exchange and archiving, which usually achieves about twice the compression ratio of gzip at roughly the same speed. The compressor, called XMill, incorporates and combines existing ...
Rate conversion of MPEG coded video by re-quantization process
ICIP '95: Proceedings of the 1995 International Conference on Image Processing (Vol. 3)-Volume 3 - Volume 3We propose rate conversion method by re-quantization in which MPEG coded video at high bit rate is converted into the MPEG bitstream at a lower bit rate without decoding to obtain the reconstructed picture. The quantization step required for re-...
New CAVLC design for lossless intra coding
ICIP'09: Proceedings of the 16th IEEE international conference on Image processingThe context-based adaptive variable length coder (CAVLC) in H.264/AVC is not appropriate for lossless video coding because it was designed for lossy video coding. Since statistical characteristics of residual data in lossy and lossless coding are quite ...
Comments