ABSTRACT
Database as a service provided on cloud computing platforms has been rapidly gaining popularity in recent years. The Snowflake Elastic Data Warehouse (henceforth referred to as Snowflake) is a cloud database service provided by Snowflake Computing. The cloud native capabilities of new database services such as Snowflake bring exciting new opportunities for database testing. First, Snowflake maintains extensive knowledge of historical customer queries, including both the query text and corresponding system configurations. Second, Snowflake is multi-tenant, which provides easy access to metadata and data that can be used to rerun customer queries from a privileged role. Furthermore, the elastic nature of Snowflake's data warehouse service allows testing with these queries using a separate set of resources without impacting the customer's production workload.
This paper presents Snowtrail, an infrastructure developed within Snowflake for testing using customer production queries with result obfuscation. Running tests with production queries provides us with direct insight into the impact of improvements and new features on customer workloads. It enables testing on queries of more shapes and complexity than can be manually constructed by developers. Snowtrail is also used to help ensure the stability of the online upgrade process of the system.
- Benoit Dageville, Thierry Cruanes, Marcin Zukowski, Vadim Antonov, Artin Avanes, Jon Bock, Jonathan Claybaugh, Daniel Engovatov, Martin Hentschel, Jiansheng Huang, Allison W. Lee, Ashish Motivala, Abdul Q. Munir, Steven Pelley, Peter Povinec, Greg Rahn, Spyridon Triantafyllis, and Philipp Unterbrunner. 2016. The Snowflake Elastic Data Warehouse. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD '16). ACM, New York, NY, USA, 215--226. Google ScholarDigital Library
- Snowflake Documentation. 2016. Multi-cluster Warehouses. (2016). https://docs.snowflake.net/manuals/user-guide/warehouses-multicluster.htmlGoogle Scholar
- Leonidas Galanis, Supiti Buranawatanachoke, Romain Colle, Benoît Dageville, Karl Dias, Jonathan Klein, Stratos Papadomanolakis, Leng Leng Tan, Venkateshwaran Venkataramani, Yujun Wang, et al. 2008. Oracle database replay. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, 1159--1170. Google ScholarDigital Library
- S. Jain, B. Howe, J. Yan, and T. Cruanes. 2018. Query2Vec: An Evaluation of NLP Techniques for Generalized Workload Analytics. ArXiv e-prints (Jan. 2018). arXiv:cs.DB/1801.05613Google Scholar
- Trupti M Kodinariya and Prashant R Makwana. 2013. Review on determining number of Cluster in K-Means Clustering. International Journal 1, 6 (2013), 90--95.Google Scholar
- Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, Nov (2008), 2579--2605.Google Scholar
- Ming-Chuan Wu, Jingren Zhou, Nicolas Bruno, Yu Zhang, and Jon Fowler. 2012. Scope Playback: Self-validation in the Cloud. In Proceedings of the Fifth International Workshop on Testing Database Systems (DBTest '12). ACM, New York, NY, USA, Article 3, 6 pages. Google ScholarDigital Library
- Khaled Yagoub, Peter Belknap, Benoit Dageville, Karl Dias, Shantanu Joshi, and Hailing Yu. 2008. Oracle's SQL Performance Analyzer. (2008).Google Scholar
Index Terms
- Snowtrail: Testing with Production Queries on a Cloud Database
Recommendations
Parallel analytics as a service
SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of DataRecently, massively parallel processing relational database systems (MPPDBs) have gained much momentum in the big data analytic market. With the advent of hosted cloud computing, we envision that the offering of MPPDB-as-a-Service (MPPDBaaS) will become ...
Expertus: A Generator Approach to Automate Performance Testing in IaaS Clouds
CLOUD '12: Proceedings of the 2012 IEEE Fifth International Conference on Cloud ComputingCloud computing is an emerging technology paradigm that revolutionizes the computing landscape by providing on-demand delivery of software, platform, and infrastructure over the Internet. Yet, architecting, deploying, and configuring enterprise ...
Comments