ABSTRACT
Triplestores are data management systems for storing and querying RDF data. Over recent years, various benchmarks have been proposed to assess the performance of triplestores across different performance measures. However, choosing the most suitable benchmark for evaluating triplestores in practical settings is not a trivial task. This is because triplestores experience varying workloads when deployed in real applications. We address the problem of determining an appropriate benchmark for a given real-life workload by providing a fine-grained comparative analysis of existing triplestore benchmarks. In particular, we analyze the data and queries provided with the existing triplestore benchmarks in addition to several real-world datasets. Furthermore, we measure the correlation between the query execution time and various SPARQL query features and rank those features based on their significance levels. Our experiments reveal several interesting insights about the design of such benchmarks. With this fine-grained evaluation, we aim to support the design and implementation of more diverse benchmarks. Application developers can use our result to analyze their data and queries and choose a data management system.
- Alberto Aleta and Yamir Moreno. 2019. Multilayer Networks in a Nutshell. Annual Review of Condensed Matter Physics 10, 1 (2019).Google ScholarCross Ref
- Günes Aluç, Olaf Hartig, M. Tamer Özsu, and Khuzaima Daudjee. 2014. Diversified Stress Testing of RDF Data Management Systems. In ISWC. 197-212.Google Scholar
- Marcelo Arenas, Claudio Gutie´rrez, and Jorge Pe´rez. 2009. On the Semantics of SPARQL. In Semantic Web Information Management - A Model-Based Perspective. Springer, 281-307.Google Scholar
- Guillaume Bagan, Angela Bonifati, Radu Ciucanu, George H. L. Fletcher, Aure´lien Lemay, and Nicky Advokaat. 2017. gMark: Schema-Driven Generation of Graphs and Queries. IEEE Trans. Knowl. Data Eng. 29, 4 (2017), 856-869. Google ScholarDigital Library
- Samantha Bail, Sandra Alkiviadous, Bijan Parsia, David Workman, Mark van Harmelen, Rafael S. Gonçalves, and Cristina Garilao. 2012. FishMark: A Linked Data Application Benchmark. In Proceedings of the Joint Workshop on Scalable and High-Performance Semantic Web Systems. 1-15. http://ceur-ws.org/Vol-943/SSWS_HPCSW2012_paper1.pdfGoogle Scholar
- Federico Battiston, Vincenzo Nicosia, and Vito Latora. 2014. Structural measures for multiplex networks. Phys. Rev. E 89 (Mar 2014), 032804. Issue 3.Google Scholar
- Michele Berlingerio, Michele Coscia, Fosca Giannotti, Anna Monreale, and Dino Pedreschi. 2013. Multidimensional networks: Foundations of structural analysis. World Wide Web 16, 5-6 (2013), 567-593. Google ScholarDigital Library
- Christian Bizer and Andreas Schultz. 2009. The Berlin SPARQL Benchmark. Int. J. Semantic Web Inf. Syst. 5, 2 (2009), 1-24.Google ScholarCross Ref
- S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, and D.-U. Hwang. 2006. Complex networks: Structure and dynamics. Physics Reports 424, 4 (2006), 175 - 308.Google ScholarCross Ref
- Felix Conrads, Jens Lehmann, Muhammad Saleem, Mohamed Morsey, and Axel-Cyrille Ngonga Ngomo. 2017. IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Triple Stores. In ISWC. Springer, 48-65.Google ScholarDigital Library
- Gianluca Demartini, Iliya Enchev, Marcin Wylot, Joël Gapany, and Philippe Cudre´-Mauroux. 2011. BowlognaBench - Benchmarking RDF Analytics. In Data-Driven Process Discovery and Analysis SIMPDA. Springer, 82-102.Google Scholar
- Songyun Duan, Anastasios Kementsietsidis, Kavitha Srinivas, and Octavian Udrea. 2011. Apples and oranges: A comparison of RDF benchmarks and real RDF datasets. In SIGMOD. ACM, 145-156. Google ScholarDigital Library
- Guillaume Ere´te´o, Michel Buffa, Fabien Gandon, and Olivier Corby. 2009. Analysis of a Real Online Social Network Using Semantic Web Frameworks. In ISWC. Springer, 180-195.Google Scholar
- Orri Erling, Alex Averbuch, Josep-Lluis Larriba-Pey, Hassan Chafi, Andrey Gubichev, Arnau Prat-Pe´rez, Minh-Duc Pham, and Peter A. Boncz. 2015. The LDBC Social Network Benchmark: Interactive Workload. In SIGMOD. ACM, 619-630. Google ScholarDigital Library
- Olaf Görlitz, Matthias Thimm, and Steffen Staab. 2012. SPLODGE: Systematic Generation of SPARQL Benchmark Queries for Linked Open Data. In ISWC. Springer, 116-132.Google ScholarDigital Library
- Yuanbo Guo, Zhengxiang Pan, and Jeff Heflin. 2005. LUBM: A benchmark for OWL knowledge base systems. J. Web Sem. 3, 2-3 (2005), 158-182. Google ScholarDigital Library
- Benedek Izsó, Zoltán Szatmári, Gábor Bergmann, Ákos Horváth, and István Ráth. 2013. Towards precise metrics for predicting graph query performance. In ASE. 421-431. Google ScholarDigital Library
- Mohamed Morsey, Jens Lehmann, Sören Auer, and Axel-Cyrille Ngonga Ngomo. 2011. DBpedia SPARQL Benchmark - Performance Assessment with Real Queries on Real Data. In ISWC. Springer, 454-469. Google ScholarDigital Library
- Vincenzo Nicosia and Vito Latora. 2015. Measuring and modeling correlations in multiplex networks. Phys. Rev. E 92 (Sep 2015), 032805. Issue 3.Google ScholarCross Ref
- Shi Qiao and Z. Meral Özsoyoglu. 2015. RBench: Application-Specific RDF Benchmarking. In SIGMOD. ACM, 1825-1838. Google ScholarDigital Library
- Shi Qiao and Z. Meral Özsoyoglu. 2015. One Size Does not Fit All: When to Use Signature-based Pruning to Improve Template Matching for RDF graphs. arXiv preprint arXiv:1501.07184(2015). arxiv:1501.07184http://arxiv.org/abs/1501.07184Google Scholar
- Muhammad Saleem, Muhammad Intizar Ali, Aidan Hogan, Qaiser Mehmood, and Axel-Cyrille Ngonga Ngomo. 2015. LSQ: The Linked SPARQL Queries Dataset. In ISWC. Springer, 261-269.Google Scholar
- Muhammad Saleem, Ali Hasnain, and Axel-Cyrille Ngonga Ngomo. 2018. LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation. J. Web Sem. 48(2018), 85-125.Google ScholarCross Ref
- Muhammad Saleem, Qaiser Mehmood, and Axel-Cyrille Ngonga Ngomo. 2015. FEASIBLE: A Feature-Based SPARQL Benchmark Generation Framework. In ISWC. Springer, 52-69. Google ScholarDigital Library
- Muhammad Saleem, Alexander Potocki, Tommaso Soru, Olaf Hartig, and Axel-Cyrille Ngonga Ngomo. 2018. CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation. In SEMANTICS(Procedia Computer Science), Vol. 137. Elsevier, 163-174.Google Scholar
- Muhammad Saleem, Claus Stadler, Qaiser Mehmood, Jens Lehmann, and Axel-Cyrille Ngonga Ngomo. 2017. SQCFramework: SPARQL Query Containment Benchmark Generation Framework. In K-CAP. 28:1-28:8. Google ScholarDigital Library
- Michael Schmidt 2009. SP2Bench: A SPARQL Performance Benchmark. In Semantic Web Information Management - A Model-Based Perspective. 371-393.Google Scholar
- Chuan Shi, Yitong Li, Jiawei Zhang, Yizhou Sun, and Philip S. Yu. 2017. A Survey of Heterogeneous Information Network Analysis. IEEE Trans. Knowl. Data Eng. 29, 1 (2017), 17-37. Google ScholarDigital Library
- Gábor Szárnyas 2016. Towards the characterization of realistic models: Evaluation of multidisciplinary graph metrics. In MODELS. 87-94. http://dl.acm.org/citation.cfm?id=2976786 Google ScholarDigital Library
- Gábor Szárnyas, Benedek Izsó, István Ráth, and Dániel Varró. 2018. The Train Benchmark: Cross-technology performance evaluation of continuous model queries. Softw. Syst. Model. 17, 4 (2018), 1365-1393. Google ScholarDigital Library
- Gábor Szárnyas, Arnau Prat-Pe´rez, Alex Averbuch, József Marton, Marcus Paradies, Moritz Kaufmann, Orri Erling, Peter A. Boncz, Vlad Haprian, and János Benjamin Antal. 2018. An early look at the LDBC Social Network Benchmark's Business Intelligence workload. In GRADES-NDA at SIGMOD. ACM, 9:1-9:11. Google ScholarDigital Library
- Dániel Varró, Oszkár Semeráth, Gábor Szárnyas, and Ákos Horváth. 2018. Towards the Automated Generation of Consistent, Diverse, Scalable and Realistic Graph Models. In Graph Transformation, Specifications, and Nets - In Memory of Hartmut Ehrig. Springer, 285-312.Google Scholar
- Hongyan Wu 2014. BioBenchmark Toyama 2012: An evaluation of the performance of triple stores on biological data. J. Biomedical Semantics 5 (2014), 32.Google ScholarCross Ref
Recommendations
RDF, Jena, SparQL and the 'Semantic Web'
SIGUCCS '09: Proceedings of the 37th annual ACM SIGUCCS fall conference: communication and collaborationThe Resource Description Format (RDF) is used to represent information modeled as a "graph": a set of individual objects, along with a set of connections among those objects. In that role, RDF is one of the pillars of the so-called Semantic Web. This ...
Towards a top-K SPARQL query benchmark
ISWC-PD'14: Proceedings of the 2014 International Conference on Posters & Demonstrations Track - Volume 1272The research on optimization of top-k SPARQL query would largely benefit from the establishment of a benchmark that allows comparing different approaches. For such a benchmark to be meaningful, at least two requirements should hold: 1) the benchmark ...
Towards a top-K SPARQL query benchmark generator
OrdRing'14: Proceedings of the 3rd International Conference on Ordering and Reasoning - Volume 1303The research on optimization of top-k SPARQL query would largely benefit from the establishment of a benchmark that allows comparing different approaches. For such a benchmark to be meaningful, at least two requirements should hold: 1) the benchmark ...
Comments