skip to main content
10.1145/1281700.1281707acmconferencesArticle/Chapter ViewAbstractPublication PagesexpcsConference Proceedingsconference-collections
Article

An analysis of XML compression efficiency

Published: 13 June 2007 Publication History

Abstract

XML simplifies data exchange among heterogeneous computers, but it is notoriously verbose and has spawned the development of many XML-specific compressors and binary formats. We present an XML test corpus and a combined efficiency metric integrating compression ratio and execution speed. We use this corpus and linear regression to assess 14 general-purpose and XML-specific compressors relative to the proposed metric. We also identify key factors when selecting a compressor. Our results show, XMill or WBXML may be useful in some instances, but a general-purpose compressor is often the best choice.

References

[1]
ASN.1 (Fast Infoset). http://asnl.elibel.tm.fr/xml/finf.htm.
[2]
Binary Optimized XML (BOX), http://box.sourceforge.net.
[3]
R. Arnold and T. Bell. A corpus for the evaluation of lossless compression algorithms. In Proceedings of the IEEE Data Compression Conference (DCC), pages 201--210, 1997.
[4]
C. Bloom. PPMZ2. http://www.cbloom.com/src/ppmz.html.
[5]
M. Burrows and D. Wheeler. A block sorting lossless data compression algorithm. DEC, Technical Report 124, 1994.
[6]
Burrows-Wheeler Transform. Wikipedia. http://en.wikipedia.org/wiki/BWT.
[7]
J. Cheney. Compressing XML with multiplexed hierarchical PPM models. In Proc. of the IEEE Data Comp. Conf. 2001.
[8]
J. Cleary and I. Witten. Data compression using adaptive coding and partial string matching. IEEE Transactions on Communications, 32(4):396--402, April 1984.
[9]
M. Cokus and D. Winkowski. XML sizing and compression study for military wireless data. In Proceedings of the XML Conference and Exposition, Baltimore, MD, 2002.
[10]
CoLinux. Cooperative Linux, http://www.colinux.org.
[11]
CoLinux, http://www.informit.com/guides/printerfriendly.asp?g=security&seqNum=25&rl=1.
[12]
Cooperative Linux, http://colinux.wikia.com.
[13]
Data Compression Info (formats), http://www.data-compression.info/Algorithms/index.htm.
[14]
Data Compression Info (corpora), http://www.data-compression.info/Corpora/index.htm.
[15]
Efficient XML Interchange Working Group (EXIWG). http://www.w3.org/XML/EXI/.
[16]
Efficient XML. AgileDelta. http://www.agiledelta.com/product_efx.html.
[17]
Extensible Markup Language (XML). W3C. http://www.w3.org/XML/.
[18]
Fast Infoset Project. https://fi.dev java.net.
[19]
J. Gailly and M. Adler. GZIP. http://www.gzip.org.
[20]
M. Girardot and N. Sundaresan. Millau: an encoding format for efficient representation and exchange of XML over the web. In Proceedings of the Int? WWW Conference on Computer Networks, pages 747--765, June 2000.
[21]
D. Hankerson, G. Harris, and P. Johnson, Jr. Intro to Information Theory and Data Compression, CRC, 1997.
[22]
S. Hariharan and P. Shankar. Compressing XML documents with finite state automata. In Proc. of the Int'l Conference on Implementation and Application of Automata, 2005.
[23]
M. Kay. SAXON, http://users.breathe.com/mhkay/saxon/.
[24]
G. Leighton, J. Diamond, and T. Müldner. AXECHOP: a grammar-based compressor for XML. In Proc. of the IEEE Data Compression Conference (DCC), page 467, 2005.
[25]
G. Leighton. XML Compression Bibliography. http://pages.cpsc.ucalgary.ca/~gleighto/research/xml-comp.html.
[26]
W. Li. XCOMP: An XML Compression Tool. M. M. Thesis, University of Waterloo, Waterloo, Ontario, 2003.
[27]
H. Liefke and D. Suciu. XMill: an efficient compressor for XML data. In Proceedings of the International Conference on Management of Data (SIGMOD), pages 153--164, 2000.
[28]
PAQ. http://cs.fit.edu/~mmahoney/compression.
[29]
B. Martin and B. Jano, eds. WAP Binary XML Content Format. http://www.w3.org/TR/wbxml/.
[30]
P. Meagher. Calculating Entropy for Data Mining. http://www.onlamp.com/pub/a/php/2005/01/06/entropy.html.
[31]
J. Min, M. Park, and C. Chung. XPRESS: a queriable compression for XML data. In Proc. of the ACM Int'l Conf. on Management of Data (SIGMOD), pages 122--133, 2003.
[32]
A. Moffat. Implementing the PPM data compression scheme. IEEE Trans. on Comm., 38(11), pages 1917--1921, 1990.
[33]
A. Moffat, R. Neal, and I. Witten. Arithmetic coding revisited. ACM Trans. on Info. Sys., 16(3):256--294, 1998.
[34]
W. Ng, W. Lam, and J. Cheng. Comparative analysis of XML compression technologies. World Wide Web, 9(1):5--33, Kluwer Academic Publishers, March 2006.
[35]
D. Raggett. HTML Tidy Library, http://tidy.sourceforge.net.
[36]
J. Seward. BZIP2. http://www.bzip.org.
[37]
C. Shannon. Communication in the presence of noise. Proceedings of the IRE, 37(1): 10--21, 1949.
[38]
D. Shkarin. PPM: one step to practicality. In Proceedings of the Data Compression Conference (DCC). 202--211, 2002. cf. http://www.winzip.com/ppmd_info.htm.
[39]
D. Sklar. Wrangling CoLinux Networking. http://www.sklar.com/blog/archives/55-Wrangling-CoLinux-Networking.html.
[40]
Summary of the Multiple File Compression Test, http://www.maximumcompression.com/data/summary_mf.php.
[41]
N. Sundaresan and R. Moussa. Algorithms and programming models for efficient representation of XML for Internet applications. In Proc. of the Int'l Conf. on the www, 2001.
[42]
P. Tolani and J. Haritsa. XGrind: A query-friendly XML compressor. In Proceedings of the International Conference on Data Engineering (ICDE), pages 225--234, 2002.
[43]
The Wayback Machine, http://www.archive.org.
[44]
WBXML Library, http://libwbxml.aymerick.com.
[45]
WinZip®. http://www.winzip.com.
[46]
XBIS. http://xbis.sourceforge.net.
[47]
XML and Compression, http://xml.coverpages.org/xmlAndCompression.html.
[48]
XML Statistics. http://pear.php.net/package/XML_Statistics
[49]
XML-Xpress. ICT. http://www.ictcompress.com/products_xmlxpress.html.
[50]
XML-ZIP. XML Solutions, http://www.xmls.com.
[51]
J. Ziv and A. Lempel. A universal algorithm for sequential data compression. IEEE Trans. on Information Theory, 23(3):337--343, 1977.

Cited By

View all
  • (2020)A study of the performance of general compressors on log filesEmpirical Software Engineering10.1007/s10664-020-09822-x25:5(3043-3085)Online publication date: 1-Sep-2020
  • (2016)Recommendations for realizing SOAP publish/subscribe in tactical networks2016 International Conference on Military Communications and Information Systems (ICMCIS)10.1109/ICMCIS.2016.7496588(1-8)Online publication date: May-2016
  • (2016)TinyIPFIXComputer Communications10.1016/j.comcom.2014.05.01274:C(63-76)Online publication date: 15-Jan-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ExpCS '07: Proceedings of the 2007 workshop on Experimental computer science
June 2007
218 pages
ISBN:9781595937513
DOI:10.1145/1281700
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. XML
  2. binary format
  3. compression
  4. corpus
  5. linear regression

Qualifiers

  • Article

Conference

ExpCS07
ExpCS07: Workshop on Experimental Computer Science
June 13 - 14, 2007
California, San Diego

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2020)A study of the performance of general compressors on log filesEmpirical Software Engineering10.1007/s10664-020-09822-x25:5(3043-3085)Online publication date: 1-Sep-2020
  • (2016)Recommendations for realizing SOAP publish/subscribe in tactical networks2016 International Conference on Military Communications and Information Systems (ICMCIS)10.1109/ICMCIS.2016.7496588(1-8)Online publication date: May-2016
  • (2016)TinyIPFIXComputer Communications10.1016/j.comcom.2014.05.01274:C(63-76)Online publication date: 15-Jan-2016
  • (2016)ReferencesDesigning Platform Independent Mobile Apps and Services10.1002/9781119060406.refs(225-228)Online publication date: 23-Sep-2016
  • (2015)XMPP Based Applications under Low Bandwidth and High Latency ConditionsLecture Notes on Software Engineering10.7763/LNSE.2015.V3.2113:4(314-317)Online publication date: 2015
  • (2015)Efficient Compression and Storage of XML OLAP CubesInternational Journal of Data Warehousing and Mining10.4018/IJDWM.201507010111:3(1-25)Online publication date: Jul-2015
  • (2015)QRFXFreeze: Queryable Compressor for RFXThe Scientific World Journal10.1155/2015/8647502015(1-8)Online publication date: 2015
  • (2015)Exploring SOAP and REST communication on the Android platformMILCOM 2015 - 2015 IEEE Military Communications Conference10.1109/MILCOM.2015.7357509(599-604)Online publication date: Oct-2015
  • (2015)Recommendations for increased efficiency of Web services in the tactical domain2015 International Conference on Military Communications and Information Systems (ICMCIS)10.1109/ICMCIS.2015.7158709(1-11)Online publication date: May-2015
  • (2015)Efficient SOAP messaging for Android2015 International Conference on Military Communications and Information Systems (ICMCIS)10.1109/ICMCIS.2015.7158691(1-9)Online publication date: May-2015
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media