skip to main content
research-article
Open Access

DIAMETRICS: benchmarking query engines at scale

Published:22 November 2022Publication History
Skip Abstract Section

Abstract

This paper introduces DIAMETRICS: a novel framework for end-to-end benchmarking and performance monitoring of query engines. DIAMETRICS consists of a number of components supporting tasks such as automated workload summarization, data anonymization, benchmark execution, monitoring, regression identification, and alerting. The architecture of DIAMETRICS is highly modular and supports multiple systems by abstracting their implementation details and relying on common canonical formats and pluggable software drivers. The end result is a powerful unified framework that is capable of supporting every aspect of benchmarking production systems and workloads. DIAMETRICS has been developed in Google and is being used to benchmark various internal query engines. In this paper, we give an overview of DIAMETRICS and discuss its design and implementation. Furthermore, we provide details about its deployment and example use cases. Given the variety of supported systems and use cases within Google, we argue that its core concepts can be used more widely to enable comparative end-to-end benchmarking in other industrial environments.

References

  1. Bacon, D.F., Bales, N., Bruno, N., Cooper, B.F., Dickinson, A., Fikes, A., Fraser, C., Gubarev, A., Joshi, M., Kogan, E., Lloyd, A., Melnik, S., Rao, R., Shue, D., Taylor, C., van der Holst, M., Woodford, D. Spanner: Becoming a SQL system. In ACM SIGMOD (2017), 331--343.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bitton, D., DeWitt, D.J., Turbyfill, C. Benchmarking database systems: A systematic approach. In VLDB (1983), 8--19.Google ScholarGoogle Scholar
  3. Boncz, P., Neumann, T., Erling, O. Tpc-h analyzed: Hidden messages and lessons learned from an influential benchmark. In TPCTC (2014), 61--76.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Carey, M.J., DeWitt, D.J., Naughton, J.F. The 007 benchmark. In ACM SIGMOD (1993), 12--21.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Carey, M.J., DeWitt, D.J., Naughton, J.F., Asgarian, M., Brown, P., Gehrke, J.E., Shah, D.N. The bucky object-relational benchmark. In ACM SIGMOD (1997), 135--146.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chattopadhyay, B., Dutta, P., Liu, W., Mccormick, A., Mokashi, A., Tinn, O., McKay, N., Mittal, S., Ching Lee, H., Zhao, X., Mikhaylin, N., Harvey, P., Lychagina, V., Xu, T., Elliott, B., Gonzalez, H., Perez, L., Shahmohammadi, F., Lomax, D., Zheng A. Procella: A fast versatile SQL query engine powering data at YouTube. In Data Works Summit (2018).Google ScholarGoogle Scholar
  7. Chaudhuri, S., Gupta, A.K., Narasayya, V. Compressing sql workloads. In ACM SIGMOD (2002), 488--499.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chaudhuri, S., Narasayya, V.R. An efficient cost-driven index selection tool for microsoft sql server. In VLDB (1997), 146--155.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R. Benchmarking Cloud Serving Systems with YCSB. In SoCC (2010), 143--154.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Crolotte, A., Ghazal, A. Introducing Skew into the TPC-H Benchmark. In TPCTC (2012), 137--145.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Deep, S., Gruenheid, A., Koutris, P., Naughton, J., Viglas, S. Comprehensive and efficient workload compression. PVLDB 14, 3 (2020), 418--430.Google ScholarGoogle Scholar
  12. Galanis, L., Buranawatanachoke, S., Colle, R., Dageville, B., Dias, K., Klein, J., Papadomanolakis, S., Tan, L.L., Venkataramani, V., Wang, Y., Wood, G. Oracle database replay. In SIGMOD (2008), 1159--1170.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Grust, T., Rittinger, J. Observing sql queries in their natural habitat. ACM Trans. Database Syst 38, 1 (2013), 3:1--3:33.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Gupta, A., Yang, F., Govig, J., Kirsch, A., Chan, K., Lai, K., Wu, S., Dhoot, S.G., Kumar, A.R., Agiwal, A., Bhansali, S., Hong, M., Cameron, J., Siddiqi, M., Jones, D., Shute, J., Gubarev, A., Venkataraman, S., Agrawal, D. Mesa: Geo-replicated, near real-time, scalable data warehousing (2014).Google ScholarGoogle Scholar
  15. Jain, S., Howe, B. Data cleaning in the wild: Reusable curation idioms from a multi-year sql workload. In QDB (2016).Google ScholarGoogle Scholar
  16. Melnik, S., Gubarev, A., Long, J.J., Romer, G., Shivakumar, S., Tolton, M., Vassilakis, T. Dremel: Interactive analysis of web-scale datasets. PVLDB 3, 1--2 (2010), 330--339.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Mozafari, B., Goh, E.Z.Y., Yoon, D.Y. Cliffguard: A principled framework for finding robust database designs. In SIGMOD (2015), 1167--1182.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Pasumanskyl, M. Inside capacitor, bigquery's next-generation columnar storage format. In Google Cloud Blog (2016).Google ScholarGoogle Scholar
  19. Samwel, B., Cieslewicz, J., Handy, B., Govig, J., Venetis, P., Yang, C., Peters, K., Shute, J., Tenedorio, D., Apte, H., Weigel, F., Wilhite, D., Yang, J., Xu, J., Li, J., Yuan, Z., Chasseur, C., Zeng, Q., Rae, I., Biyani, A., Harn, A., Xia, Y., Gubichev, A., El-Helw, A., Erling, O., Yan, Z., Yang, M., Wei, Y., Do, T., Zheng, C., Graefe, G., Sardashti, S., Aly, A.M., Agrawal, D., Gupta, A., Venkataraman, S. F1 query: Declarative querying at scale. PVLDB 11, 12 (2018), 1835--1848.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Shute, J., Vingralek, R., Samwel, B., Handy, B., Whipkey, C., Rollins, E., Oancea, M., Littlefield, K., Menestrina, D., Ellner, S., Cieslewicz, J., Rae, I., Stancescu, T., Apte, H. F1: A distributed sql database that scales. PVLDB 6, 11 (2013), 1068--1079.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Transaction Processing Performance Council. TPC Benchmark H (decision support) (2017).Google ScholarGoogle Scholar
  22. Yagoub, K., Belknap, P., Dageville, B., Dias, K., Joshi, S., Yu, H. Oracle's SQL Performance Analyzer. 2008.Google ScholarGoogle Scholar
  23. Yan, J., Jin, Q., Jain, S., Viglas, S.D., Lee, A. Snowtrail: Testing with production queries on a cloud database. In DBTest (2018), 4:1--4:6.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Yu, P.S., Chen, M.-S., Heiss, H.-U., Lee, S. On workload characterization of relational database environments. IEEE Trans. Softw. Eng 18, 4 (Apr. 1992), 347--355.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Zhu, J., Potti, N., Saurabh, S., Patel, J.M. Looking ahead makes query plans robust: Making the initial case with in-memory star schema data warehouse workloads. PVLDB 10, 8 (2017), 889--900.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. DIAMETRICS: benchmarking query engines at scale

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Communications of the ACM
      Communications of the ACM  Volume 65, Issue 12
      December 2022
      102 pages
      ISSN:0001-0782
      EISSN:1557-7317
      DOI:10.1145/3572809
      • Editor:
      • James Larus
      Issue’s Table of Contents

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 November 2022

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed
    • Article Metrics

      • Downloads (Last 12 months)5,096
      • Downloads (Last 6 weeks)32

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format