skip to main content
10.1145/3308558.3313556acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benchmarks

Published:13 May 2019Publication History

ABSTRACT

Triplestores are data management systems for storing and querying RDF data. Over recent years, various benchmarks have been proposed to assess the performance of triplestores across different performance measures. However, choosing the most suitable benchmark for evaluating triplestores in practical settings is not a trivial task. This is because triplestores experience varying workloads when deployed in real applications. We address the problem of determining an appropriate benchmark for a given real-life workload by providing a fine-grained comparative analysis of existing triplestore benchmarks. In particular, we analyze the data and queries provided with the existing triplestore benchmarks in addition to several real-world datasets. Furthermore, we measure the correlation between the query execution time and various SPARQL query features and rank those features based on their significance levels. Our experiments reveal several interesting insights about the design of such benchmarks. With this fine-grained evaluation, we aim to support the design and implementation of more diverse benchmarks. Application developers can use our result to analyze their data and queries and choose a data management system.

References

  1. Alberto Aleta and Yamir Moreno. 2019. Multilayer Networks in a Nutshell. Annual Review of Condensed Matter Physics 10, 1 (2019).Google ScholarGoogle ScholarCross RefCross Ref
  2. Günes Aluç, Olaf Hartig, M. Tamer Özsu, and Khuzaima Daudjee. 2014. Diversified Stress Testing of RDF Data Management Systems. In ISWC. 197-212.Google ScholarGoogle Scholar
  3. Marcelo Arenas, Claudio Gutie´rrez, and Jorge Pe´rez. 2009. On the Semantics of SPARQL. In Semantic Web Information Management - A Model-Based Perspective. Springer, 281-307.Google ScholarGoogle Scholar
  4. Guillaume Bagan, Angela Bonifati, Radu Ciucanu, George H. L. Fletcher, Aure´lien Lemay, and Nicky Advokaat. 2017. gMark: Schema-Driven Generation of Graphs and Queries. IEEE Trans. Knowl. Data Eng. 29, 4 (2017), 856-869. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Samantha Bail, Sandra Alkiviadous, Bijan Parsia, David Workman, Mark van Harmelen, Rafael S. Gonçalves, and Cristina Garilao. 2012. FishMark: A Linked Data Application Benchmark. In Proceedings of the Joint Workshop on Scalable and High-Performance Semantic Web Systems. 1-15. http://ceur-ws.org/Vol-943/SSWS_HPCSW2012_paper1.pdfGoogle ScholarGoogle Scholar
  6. Federico Battiston, Vincenzo Nicosia, and Vito Latora. 2014. Structural measures for multiplex networks. Phys. Rev. E 89 (Mar 2014), 032804. Issue 3.Google ScholarGoogle Scholar
  7. Michele Berlingerio, Michele Coscia, Fosca Giannotti, Anna Monreale, and Dino Pedreschi. 2013. Multidimensional networks: Foundations of structural analysis. World Wide Web 16, 5-6 (2013), 567-593. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Christian Bizer and Andreas Schultz. 2009. The Berlin SPARQL Benchmark. Int. J. Semantic Web Inf. Syst. 5, 2 (2009), 1-24.Google ScholarGoogle ScholarCross RefCross Ref
  9. S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, and D.-U. Hwang. 2006. Complex networks: Structure and dynamics. Physics Reports 424, 4 (2006), 175 - 308.Google ScholarGoogle ScholarCross RefCross Ref
  10. Felix Conrads, Jens Lehmann, Muhammad Saleem, Mohamed Morsey, and Axel-Cyrille Ngonga Ngomo. 2017. IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Triple Stores. In ISWC. Springer, 48-65.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Gianluca Demartini, Iliya Enchev, Marcin Wylot, Joël Gapany, and Philippe Cudre´-Mauroux. 2011. BowlognaBench - Benchmarking RDF Analytics. In Data-Driven Process Discovery and Analysis SIMPDA. Springer, 82-102.Google ScholarGoogle Scholar
  12. Songyun Duan, Anastasios Kementsietsidis, Kavitha Srinivas, and Octavian Udrea. 2011. Apples and oranges: A comparison of RDF benchmarks and real RDF datasets. In SIGMOD. ACM, 145-156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Guillaume Ere´te´o, Michel Buffa, Fabien Gandon, and Olivier Corby. 2009. Analysis of a Real Online Social Network Using Semantic Web Frameworks. In ISWC. Springer, 180-195.Google ScholarGoogle Scholar
  14. Orri Erling, Alex Averbuch, Josep-Lluis Larriba-Pey, Hassan Chafi, Andrey Gubichev, Arnau Prat-Pe´rez, Minh-Duc Pham, and Peter A. Boncz. 2015. The LDBC Social Network Benchmark: Interactive Workload. In SIGMOD. ACM, 619-630. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Olaf Görlitz, Matthias Thimm, and Steffen Staab. 2012. SPLODGE: Systematic Generation of SPARQL Benchmark Queries for Linked Open Data. In ISWC. Springer, 116-132.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Yuanbo Guo, Zhengxiang Pan, and Jeff Heflin. 2005. LUBM: A benchmark for OWL knowledge base systems. J. Web Sem. 3, 2-3 (2005), 158-182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Benedek Izsó, Zoltán Szatmári, Gábor Bergmann, Ákos Horváth, and István Ráth. 2013. Towards precise metrics for predicting graph query performance. In ASE. 421-431. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Mohamed Morsey, Jens Lehmann, Sören Auer, and Axel-Cyrille Ngonga Ngomo. 2011. DBpedia SPARQL Benchmark - Performance Assessment with Real Queries on Real Data. In ISWC. Springer, 454-469. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Vincenzo Nicosia and Vito Latora. 2015. Measuring and modeling correlations in multiplex networks. Phys. Rev. E 92 (Sep 2015), 032805. Issue 3.Google ScholarGoogle ScholarCross RefCross Ref
  20. Shi Qiao and Z. Meral Özsoyoglu. 2015. RBench: Application-Specific RDF Benchmarking. In SIGMOD. ACM, 1825-1838. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Shi Qiao and Z. Meral Özsoyoglu. 2015. One Size Does not Fit All: When to Use Signature-based Pruning to Improve Template Matching for RDF graphs. arXiv preprint arXiv:1501.07184(2015). arxiv:1501.07184http://arxiv.org/abs/1501.07184Google ScholarGoogle Scholar
  22. Muhammad Saleem, Muhammad Intizar Ali, Aidan Hogan, Qaiser Mehmood, and Axel-Cyrille Ngonga Ngomo. 2015. LSQ: The Linked SPARQL Queries Dataset. In ISWC. Springer, 261-269.Google ScholarGoogle Scholar
  23. Muhammad Saleem, Ali Hasnain, and Axel-Cyrille Ngonga Ngomo. 2018. LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation. J. Web Sem. 48(2018), 85-125.Google ScholarGoogle ScholarCross RefCross Ref
  24. Muhammad Saleem, Qaiser Mehmood, and Axel-Cyrille Ngonga Ngomo. 2015. FEASIBLE: A Feature-Based SPARQL Benchmark Generation Framework. In ISWC. Springer, 52-69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Muhammad Saleem, Alexander Potocki, Tommaso Soru, Olaf Hartig, and Axel-Cyrille Ngonga Ngomo. 2018. CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation. In SEMANTICS(Procedia Computer Science), Vol. 137. Elsevier, 163-174.Google ScholarGoogle Scholar
  26. Muhammad Saleem, Claus Stadler, Qaiser Mehmood, Jens Lehmann, and Axel-Cyrille Ngonga Ngomo. 2017. SQCFramework: SPARQL Query Containment Benchmark Generation Framework. In K-CAP. 28:1-28:8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Michael Schmidt 2009. SP2Bench: A SPARQL Performance Benchmark. In Semantic Web Information Management - A Model-Based Perspective. 371-393.Google ScholarGoogle Scholar
  28. Chuan Shi, Yitong Li, Jiawei Zhang, Yizhou Sun, and Philip S. Yu. 2017. A Survey of Heterogeneous Information Network Analysis. IEEE Trans. Knowl. Data Eng. 29, 1 (2017), 17-37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Gábor Szárnyas 2016. Towards the characterization of realistic models: Evaluation of multidisciplinary graph metrics. In MODELS. 87-94. http://dl.acm.org/citation.cfm?id=2976786 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Gábor Szárnyas, Benedek Izsó, István Ráth, and Dániel Varró. 2018. The Train Benchmark: Cross-technology performance evaluation of continuous model queries. Softw. Syst. Model. 17, 4 (2018), 1365-1393. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Gábor Szárnyas, Arnau Prat-Pe´rez, Alex Averbuch, József Marton, Marcus Paradies, Moritz Kaufmann, Orri Erling, Peter A. Boncz, Vlad Haprian, and János Benjamin Antal. 2018. An early look at the LDBC Social Network Benchmark's Business Intelligence workload. In GRADES-NDA at SIGMOD. ACM, 9:1-9:11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Dániel Varró, Oszkár Semeráth, Gábor Szárnyas, and Ákos Horváth. 2018. Towards the Automated Generation of Consistent, Diverse, Scalable and Realistic Graph Models. In Graph Transformation, Specifications, and Nets - In Memory of Hartmut Ehrig. Springer, 285-312.Google ScholarGoogle Scholar
  33. Hongyan Wu 2014. BioBenchmark Toyama 2012: An evaluation of the performance of triple stores on biological data. J. Biomedical Semantics 5 (2014), 32.Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    WWW '19: The World Wide Web Conference
    May 2019
    3620 pages
    ISBN:9781450366748
    DOI:10.1145/3308558

    Copyright © 2019 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 13 May 2019

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate1,899of8,196submissions,23%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format