Abstract
This paper introduces DIAMETRICS: a novel framework for end-to-end benchmarking and performance monitoring of query engines. DIAMETRICS consists of a number of components supporting tasks such as automated workload summarization, data anonymization, benchmark execution, monitoring, regression identification, and alerting. The architecture of DIAMETRICS is highly modular and supports multiple systems by abstracting their implementation details and relying on common canonical formats and pluggable software drivers. The end result is a powerful unified framework that is capable of supporting every aspect of benchmarking production systems and workloads. DIAMETRICS has been developed in Google and is being used to benchmark various internal query engines. In this paper, we give an overview of DIAMETRICS and discuss its design and implementation. Furthermore, we provide details about its deployment and example use cases. Given the variety of supported systems and use cases within Google, we argue that its core concepts can be used more widely to enable comparative end-to-end benchmarking in other industrial environments.
- Bacon, D.F., Bales, N., Bruno, N., Cooper, B.F., Dickinson, A., Fikes, A., Fraser, C., Gubarev, A., Joshi, M., Kogan, E., Lloyd, A., Melnik, S., Rao, R., Shue, D., Taylor, C., van der Holst, M., Woodford, D. Spanner: Becoming a SQL system. In ACM SIGMOD (2017), 331--343.Google ScholarDigital Library
- Bitton, D., DeWitt, D.J., Turbyfill, C. Benchmarking database systems: A systematic approach. In VLDB (1983), 8--19.Google Scholar
- Boncz, P., Neumann, T., Erling, O. Tpc-h analyzed: Hidden messages and lessons learned from an influential benchmark. In TPCTC (2014), 61--76.Google ScholarDigital Library
- Carey, M.J., DeWitt, D.J., Naughton, J.F. The 007 benchmark. In ACM SIGMOD (1993), 12--21.Google ScholarDigital Library
- Carey, M.J., DeWitt, D.J., Naughton, J.F., Asgarian, M., Brown, P., Gehrke, J.E., Shah, D.N. The bucky object-relational benchmark. In ACM SIGMOD (1997), 135--146.Google ScholarDigital Library
- Chattopadhyay, B., Dutta, P., Liu, W., Mccormick, A., Mokashi, A., Tinn, O., McKay, N., Mittal, S., Ching Lee, H., Zhao, X., Mikhaylin, N., Harvey, P., Lychagina, V., Xu, T., Elliott, B., Gonzalez, H., Perez, L., Shahmohammadi, F., Lomax, D., Zheng A. Procella: A fast versatile SQL query engine powering data at YouTube. In Data Works Summit (2018).Google Scholar
- Chaudhuri, S., Gupta, A.K., Narasayya, V. Compressing sql workloads. In ACM SIGMOD (2002), 488--499.Google ScholarDigital Library
- Chaudhuri, S., Narasayya, V.R. An efficient cost-driven index selection tool for microsoft sql server. In VLDB (1997), 146--155.Google ScholarDigital Library
- Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R. Benchmarking Cloud Serving Systems with YCSB. In SoCC (2010), 143--154.Google ScholarDigital Library
- Crolotte, A., Ghazal, A. Introducing Skew into the TPC-H Benchmark. In TPCTC (2012), 137--145.Google ScholarDigital Library
- Deep, S., Gruenheid, A., Koutris, P., Naughton, J., Viglas, S. Comprehensive and efficient workload compression. PVLDB 14, 3 (2020), 418--430.Google Scholar
- Galanis, L., Buranawatanachoke, S., Colle, R., Dageville, B., Dias, K., Klein, J., Papadomanolakis, S., Tan, L.L., Venkataramani, V., Wang, Y., Wood, G. Oracle database replay. In SIGMOD (2008), 1159--1170.Google ScholarDigital Library
- Grust, T., Rittinger, J. Observing sql queries in their natural habitat. ACM Trans. Database Syst 38, 1 (2013), 3:1--3:33.Google ScholarDigital Library
- Gupta, A., Yang, F., Govig, J., Kirsch, A., Chan, K., Lai, K., Wu, S., Dhoot, S.G., Kumar, A.R., Agiwal, A., Bhansali, S., Hong, M., Cameron, J., Siddiqi, M., Jones, D., Shute, J., Gubarev, A., Venkataraman, S., Agrawal, D. Mesa: Geo-replicated, near real-time, scalable data warehousing (2014).Google Scholar
- Jain, S., Howe, B. Data cleaning in the wild: Reusable curation idioms from a multi-year sql workload. In QDB (2016).Google Scholar
- Melnik, S., Gubarev, A., Long, J.J., Romer, G., Shivakumar, S., Tolton, M., Vassilakis, T. Dremel: Interactive analysis of web-scale datasets. PVLDB 3, 1--2 (2010), 330--339.Google ScholarDigital Library
- Mozafari, B., Goh, E.Z.Y., Yoon, D.Y. Cliffguard: A principled framework for finding robust database designs. In SIGMOD (2015), 1167--1182.Google ScholarDigital Library
- Pasumanskyl, M. Inside capacitor, bigquery's next-generation columnar storage format. In Google Cloud Blog (2016).Google Scholar
- Samwel, B., Cieslewicz, J., Handy, B., Govig, J., Venetis, P., Yang, C., Peters, K., Shute, J., Tenedorio, D., Apte, H., Weigel, F., Wilhite, D., Yang, J., Xu, J., Li, J., Yuan, Z., Chasseur, C., Zeng, Q., Rae, I., Biyani, A., Harn, A., Xia, Y., Gubichev, A., El-Helw, A., Erling, O., Yan, Z., Yang, M., Wei, Y., Do, T., Zheng, C., Graefe, G., Sardashti, S., Aly, A.M., Agrawal, D., Gupta, A., Venkataraman, S. F1 query: Declarative querying at scale. PVLDB 11, 12 (2018), 1835--1848.Google ScholarDigital Library
- Shute, J., Vingralek, R., Samwel, B., Handy, B., Whipkey, C., Rollins, E., Oancea, M., Littlefield, K., Menestrina, D., Ellner, S., Cieslewicz, J., Rae, I., Stancescu, T., Apte, H. F1: A distributed sql database that scales. PVLDB 6, 11 (2013), 1068--1079.Google ScholarDigital Library
- Transaction Processing Performance Council. TPC Benchmark H (decision support) (2017).Google Scholar
- Yagoub, K., Belknap, P., Dageville, B., Dias, K., Joshi, S., Yu, H. Oracle's SQL Performance Analyzer. 2008.Google Scholar
- Yan, J., Jin, Q., Jain, S., Viglas, S.D., Lee, A. Snowtrail: Testing with production queries on a cloud database. In DBTest (2018), 4:1--4:6.Google ScholarDigital Library
- Yu, P.S., Chen, M.-S., Heiss, H.-U., Lee, S. On workload characterization of relational database environments. IEEE Trans. Softw. Eng 18, 4 (Apr. 1992), 347--355.Google ScholarDigital Library
- Zhu, J., Potti, N., Saurabh, S., Patel, J.M. Looking ahead makes query plans robust: Making the initial case with in-memory star schema data warehouse workloads. PVLDB 10, 8 (2017), 889--900.Google ScholarDigital Library
Index Terms
- DIAMETRICS: benchmarking query engines at scale
Recommendations
DIAMetrics: Benchmarking Query Engines at Scale
This paper introduces DIAMetrics: a novel framework for end-to-end benchmarking and performance monitoring of query engines. DIAMetrics consists of a number of components supporting tasks such as automated workload summarization, data anonymization, ...
DIAMetrics: benchmarking query engines at scale
This paper introduces DIAMetrics: a novel framework for end-to-end benchmarking and performance monitoring of query engines. DIAMetrics consists of a number of components supporting tasks such as automated workload summarization, data anonymization, ...
Technical Perspective DIAMetrics: Benchmarking Query Engines at Scale
Benchmarking database systems has a long and successful history in making industrial database systems comparable, and is also a cornerstone of quantifiable experimental data systems research. Creating good benchmarks has been described as something of ...
Comments