ABSTRACT
In order to meet service level agreements (SLAs) and to maintain peak performance for database management systems (DBMS), database administrators (DBAs) need to implement policies for effective workload scheduling, admission control, and resource provisioning. Accurately predicting response times of DBMS queries is necessary for a DBA to effectively achieve these goals. This task is particularly challenging due to the fact that a database workload typically consists of many concurrently running queries and an accurate model needs to capture their interactions. Additional challenges are introduced when DBMSes are run in dynamic cloud computing environments, where workload, data, and physical resources can change frequently, on-the-fly. Building an efficient and highly accurate online DBMS performance model that is robust in the face of changing workloads, data evolution, and physical resource allocations is still an unsolved problem. In this work, our goal is to build such an online performance model for database appliances using an experiment-driven modeling approach. We use a Bayesian approach and build novel Gaussian models that take into account the interaction among concurrently executing queries and predict response times of individual DBMS queries. A key feature of our modeling approach is that the models can be updated online in response to new queries or data, or changing resource allocations. We experimentally demonstrate that our models are accurate and effective -- our best models have an average prediction error of 16.3% in the worst case.
- Amazon Elastic Compute Cloud (Amazon EC2). {online} http://aws.amazon.com/ec2/.Google Scholar
- Amazon Relational Database Service (Amazon RDS). {online} http://aws.amazon.com/rds/.Google Scholar
- Microsoft SQLAzure. {online} http://www.microsoft.com/en-us/sqlazure/default.aspx.Google Scholar
- M. Abouzour, K. Salem, and P. Bumbulis. Automatic tuning of the multiprogramming level in Sybase SQL Anywhere. In Workshop on Self-managing Database Systems (SMDB), 2010.Google ScholarCross Ref
- M. Ahmad, A. Aboulnaga, S. Babu, and K. Munagala. Interaction-aware scheduling of report generation workloads. VLDB Journal, 2011. Google ScholarDigital Library
- M. Ahmad, S. Duan, A. Aboulnaga, and S. Babu. Predicting completion times of batch query workloads using interaction-aware models and simulation. In Extending Database Technology (EDBT), 2011. Google ScholarDigital Library
- E. B. Andersen. Asymptotic properties of conditional maximum-likelihood estimators. Journal of the Royal Statistical Society. Series B (Methodological), 32(2), 1970.Google ScholarCross Ref
- A. Ganapathi, H. A. Kuno, U. Dayal, J. L. Wiener, A. Fox, M. I. Jordan, and D. A. Patterson. Predicting multiple metrics for queries: Better decisions enabled by machine learning. In International Conference on Data Engineering (ICDE), 2009. Google ScholarDigital Library
- C. Gupta, A. Mehta, and U. Dayal. PQR : Predicting query execution times for autonomous workload management. In International Conference on Autonomic Computing (ICAC), 2008. Google ScholarDigital Library
- W. W. Hager. Updating the inverse of a matrix. SIAM Review, 31, 1989. Google ScholarDigital Library
- D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009. Google ScholarDigital Library
- V. Markl, G. M. Lohman, and V. Raman. LEO: An autonomic query optimizer for DB2. IBM Systems Journal, 42(1), 2003. Google ScholarDigital Library
- D. Narayanan, E. Thereska, and A. Ailamaki. Continuous resource monitoring for self-predicting DBMS. In Symposium on Modeling,Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), 2005. Google ScholarDigital Library
- A. O'Hagan. Some bayesian numerical analysis. Bayesian Statistics 4, 1992.Google Scholar
- C. Rasmussen and C. Williams. Gaussian Processes for Machine Learning. Cambridge: MIT Press, 2006. Google ScholarDigital Library
- M. B. Sheikh, U. F. Minhas, O. Z. Khan, A. Aboulnaga, P. Poupart, and D. J. Taylor. A Bayesian approach to online performance modeling for database appliances using Gaussian models. Technical Report CS-2011-13, University of Waterloo, 2011.Google ScholarDigital Library
- E. Thereska, D. Narayanan, and G. R. Ganger. Towards self-predicting systems: What if you could ask 'what-if'? Knowledge Eng. Review, 21(3), 2006. Google ScholarDigital Library
- V. Thummala and S. Babu. iTuned: a tool for configuring and visualizing database parameters. In International Conference on Management of Data (SIGMOD), 2010. Google ScholarDigital Library
- S. Tozer, T. Brecht, and A. Aboulnaga. Q-Cop: Avoiding bad query mixes to minimize client timeouts under heavy loads. In International Conference on Data Engineering (ICDE), 2010.Google ScholarCross Ref
- The TPC-H Benchmark. {online} http://www.tpc.org/tpch/.Google Scholar
- B. J. Watson, M. Marwah, D. Gmach, Y. Chen, M. Arlitt, and Z. Wang. Probabilistic performance modeling of virtualized resource allocation. In International Conference on Autonomic Computing (ICAC), 2010. Google ScholarDigital Library
- G. Weikum, A. Moenkeberg, C. Hasse, and P. Zabback. Self-tuning database technology and information services: from wishful thinking to viable engineering. In International Conference on Very Large Data Bases (VLDB), 2002. Google ScholarDigital Library
- Q. Zhang, L. Cherkasova, and E. Smirni. A regression-based analytic model for dynamic resource provisioning of multi-tier applications. In International Conference on Autonomic Computing (ICAC), 2007. Google ScholarDigital Library
Index Terms
- A bayesian approach to online performance modeling for database appliances using gaussian models
Recommendations
Performance issues in database systems
The performance of transaction processing systems is affected by contention for hardware as well as software resources (data objects). Software contention becomes prominent in database systems because concurrency control mechanism, which is used to ...
Regression based performance modeling and provisioning for NoSQL cloud databases
Cloud computing is a successful and emerging paradigm that supports on-demand services with pay-as-you-go model. Because of the exponential growth of data, NoSQL databases have been used to manage data in the cloud. In this scenario, it is fundamental ...
Machine learning approach for cloud NoSQL databases performance modeling
CCGRID '16: Proceedings of the 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid ComputingCloud computing is a successful, emerging paradigm that supports on-demand services with pay-as-you-go model. With the exponential growth of data, NoSQL databases have been used to manage data in the cloud. In these newly emerging settings, mechanisms ...
Comments