|
ABSTRACT
We describe the engineering of Succinct DOM (SDOM), a DOM implementation, written in C++, which is suitable for in-memory representation of large static XML documents. SDOM avoids the use of pointers, and is based upon succinct data structures, which use an information-theoretically minimum amount of space to represent an object. SDOM gives a space-efficient in-memory representation, with stable and predictable memory usage. The space used by SDOM is an order of magnitude less than that used by a standard C++ DOM representation such as Xerces, but SDOM is extremely fast: navigation is in some cases faster than for a pointer-based representation such as Xerces (even for moderate-sized documents which can comfortably be loaded into main memory by Xerces). A variant, SDOM-CT, applies bzip-based compression to textual and attribute data, and its space usage is comparable with "queryable" XML compressors. Some of these compressors support navigation and/or querying (e.g. subpath queries) of the compressed file. SDOM-CT does not support querying directly, but remains extremely fast: it is several orders of magnitude faster for navigation than queryable XML compressors that support navigation (and only a few times slower than say Xerces).
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Peter Boncz , Torsten Grust , Maurice van Keulen , Stefan Manegold , Jan Rittinger , Jens Teubner, MonetDB/XQuery: a fast XQuery processor powered by a relational engine, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, June 27-29, 2006, Chicago, IL, USA
[doi> 10.1145/1142473.1142527]
|
| |
3
|
BZip2. http://www.bzip.org.
|
| |
4
|
Busatto, G., Lohrey, M., and Maneth, S. 2005. Efficient memory representation of XML documents. In Proc. Database Programming Languages, 10th International Symposium, DBPL 2005. Springer LNCS 3774, pp. 199--216. doi:10.1007/11601524_13
|
| |
5
|
Cheng, J., Ng, W., 2004. XQzip: Querying compressed XML using structural indexing. In Proc. 9th International Conference on Extending Database Technology, EDBT '04. Springer LNCS 2992, pp. 219--236. doi:10.1007/b95855.
|
| |
6
|
|
| |
7
|
Delpratt, D., Rahman, N., and Raman, R. 2007. Compressed prefix sums. In Proc. Theory and Practice of Computer Science, SOFSEM '07. Springer LNCS 4362, pp. 235--247. doi:10.1007/978-3-540-69507-3_19.
|
| |
8
|
Delpratt, D., Rahman, N., and Raman, R. 2006. Engineering the LOUDS succinct tree representation. In Proc.5th Workshop on Experimental Algorithms, WEA '06. Springer LNCS 4007, pp. 134--145. doi:10.1007/11764298_12.
|
| |
9
|
|
 |
10
|
|
| |
11
|
|
| |
12
|
Jacobson, G. 1989. Space-efficient static trees and graphs. In Proc. 30th Symp. Foundations of Computer Science, FOCS '89, pp. 549--554. doi:10.1109/SFCS.1989.63533
|
| |
13
|
Kim, D. K., Na, J. C., Kim, J. E., and Park, K. 2005. Efficient implementation of rank and select functions for succinct representation. In Proc. 4th Workshop on Experimental Algorithms, WEA 2005. Springer LNCS 3503, pp. 315--327. doi:10.1007/11427186_28.
|
 |
14
|
|
 |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
Raman, R. and Rao, S. S. 2003. Succinct dynamic dictionaries and trees. In Proc. ICALP 2003, Springer LNCS 2719, pp. 357--368.
|
| |
20
|
Saxon. http://saxon.source.forge.net/
|
| |
21
|
|
| |
22
|
Xerces C++ Parser. http://xerces.apache.org/xerces-c/
|
| |
23
|
Xerces Java 2 Parser. http://xerces.apache.org/xerces2-j/
|
| |
24
|
UW XML Repository. http://www.cs.washingon.edu/research/xmldatasets/www/repository.html
|
| |
25
|
VOTable Documentation. http://www.us-vo.org/VOTable/
|
| |
26
|
W3C DOM API documentation. 2004. http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/
|
| |
27
|
W3C DOM Traversal, 2000. http://www.w3.org/TR/DOM-Level-2-Traversal-Range/traversal.html
|
| |
28
|
|
|