ABSTRACT
The Web of Data is increasingly producing large RDF datasets from diverse fields of knowledge, pushing the Web to a data-to-data cloud. However, traditional RDF representations were inspired by a document-centric view, which results in verbose/redundant data, costly to exchange and post-process. This article discusses an ongoing doctoral thesis addressing efficient formats for publication, exchange and consumption of RDF on a large scale. First, a binary serialization format for RDF, called HDT, is proposed. Then, we focus on compressed rich-functional structures which take part of efficient HDT representation as well as most applications performing on huge RDF datasets.
- Notation3. W3C Design Issues. 1998. http://www.w3.org/DesignIssues/Notation3.Google Scholar
- RDF/XML Syntax Specification (Revised). W3C Recommendation. 2004. http://www.w3.org/TR/rdf-syntax-grammar/.Google Scholar
- SPARQL Query Language for RDF. W3C Recommendation. 2008. http://www.w3.org/TR/rdf-sparql-query/.Google Scholar
- Turtle - Terse RDF Triple Language. W3C Team Submission. 2008. http://www.w3.org/TeamSubmission/turtle/.Google Scholar
- Efficient XML Interchange (EXI) Format 1.0. W3C Candidate Recommendation. 2009. http://www.w3.org/TR/2009/CR-exi-20091208/.Google Scholar
- Binary RDF Representation for Publication and Exchange (HDT). W3C Member Submission. 2011. http://www.w3.org/Submission/2011/03/.Google Scholar
- D. Abadi, A. Marcus, S. Madden, and K. Hollenbach. SW-Store: a vertically partitioned DBMS for Semantic Web data management.The VLDB Journal, 18:385--406, 2009. Google ScholarDigital Library
- K. Alexander. RDF in JSON: A Specification for serialising RDF in JSON. In SFSW, 2008.Google Scholar
- K. Alexander, R. Cyganiak, M. Hausenblas, and J. Zhao. Describing Linked Datasets-On the Design and Usage of voiD, the 'Vocabulary of Interlinked Datasets'. In LDOW at WWW, 2009.Google Scholar
- S. Alvarez Garcia, N. Brisaboa, J. Fernandez, and M. Martinez-Prieto. Compressed k2-Triples for Full-In-Memory RDF Engines. In AMCIS, paper 350, 2011.Google Scholar
- M. Arias, J. Fernandez, M. Martinez-Prieto, and C. Gutierrez. HDT-it: Storing, Sharing and Visualizing Huge RDF Datasets. In ISWC, 2011.Google Scholar
- M. Atre, V. Chaoji, M. Zaki, and J. Hendler. Matrix "Bit" loaded: a scalable lightweight join query processor for RDF data. In WWW, pages 41--50, 2010. Google ScholarDigital Library
- C. Bizer, T. Heath, K. Idehen, and T. Berners-Lee. Linked Data On the Web (LDOW2008). In WWW, pages 1265--1266, 2008. Google ScholarDigital Library
- N. Brisaboa, R. Canovas, F. Claude, M. A. Martinez-Prieto, and G. Navarro. Compressed String Dictionaries. In SEA, pages 136--147, 2011. Google ScholarDigital Library
- R. Cyganiak, H. Stenzhorn, R. Delbru, S. Decker, and G. Tummarello. Semantic sitemaps: Efficient and flexible access to datasets on the semantic web. In ESWC, pages 690--704. Springer-Verlag, 2008. Google ScholarDigital Library
- L. Ding and T. Finin. Characterizing the Semantic Web on the Web. In RISC, pages 242--257, 2006. Google ScholarDigital Library
- J. Fernandez, C. Gutierrez, and M. Martinez-Prieto. RDF compression: basic approaches. In WWW, pages 1091--1092, 2010. Google ScholarDigital Library
- J. Fernandez, M. Martinez-Prieto, and C. Gutierrez. Compact Representation of Large RDF Data Sets for Publishing and Exchange. In ISWC, pages 193--208, 2010. Google ScholarDigital Library
- R. Gonzalez, S. Grabowski, V. Makinen, and G. Navarro. Practical Implementation of Rank and Select Queries. In WEA, pages 27--38, 2005.Google Scholar
- C. Gutierrez, C. Hurtado, A. Mendelzon, and J. Perez. Foundations of semantic web databases.J COMPUT SYST SCI, 77:520--541, 2011. Google ScholarDigital Library
- T. Heath and C. Bizer.Linked Data: Evolving the Web into a Global Data Space. Morgan & Claypool, 2011. Google ScholarDigital Library
- W. Hu, J. Chen, H. Zhang, and Y. Qu. How Matchable Are Four Thousand Ontologies on the Semantic Web. In ESWC, pages 290--304, 2011. Google ScholarDigital Library
- D. Le-Phuoc, J. X. Parreira, V. Reynolds, and M. Hauswirth. RDF On the Go : An RDF Storage and Query Processor for Mobile Devices. In ISWC, 2010. Available athttp://iswc2010.semanticweb.org/pdf/503.pdf.Google Scholar
- M. Martinez-Prieto, J. Fernandez, and R. Canovas. Compression of RDF Dictionaries. In SAC, pages 1841--1848, 2012. Google ScholarDigital Library
- G. Navarro and V. Makinen. Compressed Full-Text Indexes. ACM Computing Surveys, 39(1):art. 2, 2007. Google ScholarDigital Library
- T. Neumann and G. Weikum. The RDF-3X engine for scalable management of RDF data. The VLDB Journal, 19, 2010. Google ScholarDigital Library
- M. Schmidt, M. Meier, and G. Lausen. Foundations of SPARQL query optimization. In ICDT, 2010. Google ScholarDigital Library
- L. Sidirourgos, R. Goncalves, M. Kersten, N. Nes, and S. Manegold. Column-Store Support for RDF Data Management: not all Swans are White.VLDB Endowment, 1(2):1553--1563, 2008. Google ScholarDigital Library
- Y. Theoharis, Y. Tzitzikas, D. Kotzinos, and V. Christophides. On Graph Features of Semantic Web Schemas.IEEE Trans. on Know. and Data Engineering, 20(5):692--702, 2008. Google ScholarDigital Library
- J. Urbani, J. Maassen, and H. Bal. Massive Semantic Web data compression with MapReduce. In HPDC 2010, pages 795--802, 2010. Google ScholarDigital Library
- I. H. Witten, A. Moffat, and T. C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann, 1999. Google ScholarDigital Library
Index Terms
- Binary RDF for scalable publishing, exchanging and consumption in the web of data
Recommendations
Exchanging intensional XML data
SIGMOD '03: Proceedings of the 2003 ACM SIGMOD international conference on Management of dataXML is becoming the universal format for data exchange between applications. Recently, the emergence of Web services as standard means of publishing and accessing data on the Web introduced a new class of XML documents, which we call intensional ...
Web document compaction by compressing URI references in RDF and OWL data
ICUIMC '08: Proceedings of the 2nd international conference on Ubiquitous information management and communicationThe enormous web documents in WWW have made it dramatically difficult to retrieve meaningful information which we really want to find out. The Semantic Web technology has been considered to solve the problem and RDF and OWL have emerged as standards for ...
Exchanging intensional XML data
Special Issue: SIGMOD/PODS 2003XML is becoming the universal format for data exchange between applications. Recently, the emergence of Web services as standard means of publishing and accessing data on the Web introduced a new class of XML documents, which we call intensional ...
Comments