DIAMETRICS: benchmarking query engines at scale

Authors:
Shaleen Deep

University of Wisconsin-Madison, Madison, WI

University of Wisconsin-Madison, Madison, WI
View Profile

,
Anja Gruenheid

Google Inc

Google Inc
View Profile

,
Kruthi Nagaraj

Google Inc

Google Inc
View Profile

,
Hiro Naito

Google Inc

Google Inc
View Profile

,
Jeff Naughton

Google Inc

Google Inc
View Profile

,
Stratis Viglas

Google Inc

Google Inc
View Profile

Authors Info & Claims

Communications of the ACM Volume 65 Issue 12December 2022pp 105–112https://doi.org/10.1145/3567464

Published:22 November 2022Publication History

Communications of the ACM

Abstract

This paper introduces DIAMETRICS: a novel framework for end-to-end benchmarking and performance monitoring of query engines. DIAMETRICS consists of a number of components supporting tasks such as automated workload summarization, data anonymization, benchmark execution, monitoring, regression identification, and alerting. The architecture of DIAMETRICS is highly modular and supports multiple systems by abstracting their implementation details and relying on common canonical formats and pluggable software drivers. The end result is a powerful unified framework that is capable of supporting every aspect of benchmarking production systems and workloads. DIAMETRICS has been developed in Google and is being used to benchmark various internal query engines. In this paper, we give an overview of DIAMETRICS and discuss its design and implementation. Furthermore, we provide details about its deployment and example use cases. Given the variety of supported systems and use cases within Google, we argue that its core concepts can be used more widely to enable comparative end-to-end benchmarking in other industrial environments.

References

Bacon, D.F., Bales, N., Bruno, N., Cooper, B.F., Dickinson, A., Fikes, A., Fraser, C., Gubarev, A., Joshi, M., Kogan, E., Lloyd, A., Melnik, S., Rao, R., Shue, D., Taylor, C., van der Holst, M., Woodford, D. Spanner: Becoming a SQL system. In ACM SIGMOD (2017), 331--343.Google ScholarDigital Library
Bitton, D., DeWitt, D.J., Turbyfill, C. Benchmarking database systems: A systematic approach. In VLDB (1983), 8--19.Google Scholar
Boncz, P., Neumann, T., Erling, O. Tpc-h analyzed: Hidden messages and lessons learned from an influential benchmark. In TPCTC (2014), 61--76.Google ScholarDigital Library
Carey, M.J., DeWitt, D.J., Naughton, J.F. The 007 benchmark. In ACM SIGMOD (1993), 12--21.Google ScholarDigital Library
Carey, M.J., DeWitt, D.J., Naughton, J.F., Asgarian, M., Brown, P., Gehrke, J.E., Shah, D.N. The bucky object-relational benchmark. In ACM SIGMOD (1997), 135--146.Google ScholarDigital Library
Chattopadhyay, B., Dutta, P., Liu, W., Mccormick, A., Mokashi, A., Tinn, O., McKay, N., Mittal, S., Ching Lee, H., Zhao, X., Mikhaylin, N., Harvey, P., Lychagina, V., Xu, T., Elliott, B., Gonzalez, H., Perez, L., Shahmohammadi, F., Lomax, D., Zheng A. Procella: A fast versatile SQL query engine powering data at YouTube. In Data Works Summit (2018).Google Scholar
Chaudhuri, S., Gupta, A.K., Narasayya, V. Compressing sql workloads. In ACM SIGMOD (2002), 488--499.Google ScholarDigital Library
Chaudhuri, S., Narasayya, V.R. An efficient cost-driven index selection tool for microsoft sql server. In VLDB (1997), 146--155.Google ScholarDigital Library
Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R. Benchmarking Cloud Serving Systems with YCSB. In SoCC (2010), 143--154.Google ScholarDigital Library
Crolotte, A., Ghazal, A. Introducing Skew into the TPC-H Benchmark. In TPCTC (2012), 137--145.Google ScholarDigital Library
Deep, S., Gruenheid, A., Koutris, P., Naughton, J., Viglas, S. Comprehensive and efficient workload compression. PVLDB 14, 3 (2020), 418--430.Google Scholar
Galanis, L., Buranawatanachoke, S., Colle, R., Dageville, B., Dias, K., Klein, J., Papadomanolakis, S., Tan, L.L., Venkataramani, V., Wang, Y., Wood, G. Oracle database replay. In SIGMOD (2008), 1159--1170.Google ScholarDigital Library
Grust, T., Rittinger, J. Observing sql queries in their natural habitat. ACM Trans. Database Syst 38, 1 (2013), 3:1--3:33.Google ScholarDigital Library
Gupta, A., Yang, F., Govig, J., Kirsch, A., Chan, K., Lai, K., Wu, S., Dhoot, S.G., Kumar, A.R., Agiwal, A., Bhansali, S., Hong, M., Cameron, J., Siddiqi, M., Jones, D., Shute, J., Gubarev, A., Venkataraman, S., Agrawal, D. Mesa: Geo-replicated, near real-time, scalable data warehousing (2014).Google Scholar
Jain, S., Howe, B. Data cleaning in the wild: Reusable curation idioms from a multi-year sql workload. In QDB (2016).Google Scholar
Melnik, S., Gubarev, A., Long, J.J., Romer, G., Shivakumar, S., Tolton, M., Vassilakis, T. Dremel: Interactive analysis of web-scale datasets. PVLDB 3, 1--2 (2010), 330--339.Google ScholarDigital Library
Mozafari, B., Goh, E.Z.Y., Yoon, D.Y. Cliffguard: A principled framework for finding robust database designs. In SIGMOD (2015), 1167--1182.Google ScholarDigital Library
Pasumanskyl, M. Inside capacitor, bigquery's next-generation columnar storage format. In Google Cloud Blog (2016).Google Scholar
Samwel, B., Cieslewicz, J., Handy, B., Govig, J., Venetis, P., Yang, C., Peters, K., Shute, J., Tenedorio, D., Apte, H., Weigel, F., Wilhite, D., Yang, J., Xu, J., Li, J., Yuan, Z., Chasseur, C., Zeng, Q., Rae, I., Biyani, A., Harn, A., Xia, Y., Gubichev, A., El-Helw, A., Erling, O., Yan, Z., Yang, M., Wei, Y., Do, T., Zheng, C., Graefe, G., Sardashti, S., Aly, A.M., Agrawal, D., Gupta, A., Venkataraman, S. F1 query: Declarative querying at scale. PVLDB 11, 12 (2018), 1835--1848.Google ScholarDigital Library
Shute, J., Vingralek, R., Samwel, B., Handy, B., Whipkey, C., Rollins, E., Oancea, M., Littlefield, K., Menestrina, D., Ellner, S., Cieslewicz, J., Rae, I., Stancescu, T., Apte, H. F1: A distributed sql database that scales. PVLDB 6, 11 (2013), 1068--1079.Google ScholarDigital Library
Transaction Processing Performance Council. TPC Benchmark H (decision support) (2017).Google Scholar
Yagoub, K., Belknap, P., Dageville, B., Dias, K., Joshi, S., Yu, H. Oracle's SQL Performance Analyzer. 2008.Google Scholar
Yan, J., Jin, Q., Jain, S., Viglas, S.D., Lee, A. Snowtrail: Testing with production queries on a cloud database. In DBTest (2018), 4:1--4:6.Google ScholarDigital Library
Yu, P.S., Chen, M.-S., Heiss, H.-U., Lee, S. On workload characterization of relational database environments. IEEE Trans. Softw. Eng 18, 4 (Apr. 1992), 347--355.Google ScholarDigital Library
Zhu, J., Potti, N., Saurabh, S., Patel, J.M. Looking ahead makes query plans robust: Making the initial case with in-memory star schema data warehouse workloads. PVLDB 10, 8 (2017), 889--900.Google ScholarDigital Library

Index Terms

DIAMETRICS: benchmarking query engines at scale
1. General and reference
  1. Cross-computing tools and techniques
    1. Metrics

Recommendations

DIAMetrics: Benchmarking Query Engines at Scale

This paper introduces DIAMetrics: a novel framework for end-to-end benchmarking and performance monitoring of query engines. DIAMetrics consists of a number of components supporting tasks such as automated workload summarization, data anonymization, ...
Read More
DIAMetrics: benchmarking query engines at scale

This paper introduces DIAMetrics: a novel framework for end-to-end benchmarking and performance monitoring of query engines. DIAMetrics consists of a number of components supporting tasks such as automated workload summarization, data anonymization, ...
Read More
Technical Perspective DIAMetrics: Benchmarking Query Engines at Scale

Benchmarking database systems has a long and successful history in making industrial database systems comparable, and is also a cornerstone of quantifiable experimental data systems research. Creating good benchmarks has been described as something of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Communications of the ACM Volume 65, Issue 12
December 2022
102 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/3572809
Editor:
James Larus
Association for Computing Machinery, New York, NY
Issue’s Table of Contents
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 November 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 8,838
  Total Downloads
- Downloads (Last 12 months)5,096
- Downloads (Last 6 weeks)32
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

DIAMETRICS: benchmarking query engines at scale

Communications of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

DIAMetrics: Benchmarking Query Engines at Scale

DIAMetrics: benchmarking query engines at scale

Technical Perspective DIAMetrics: Benchmarking Query Engines at Scale