skip to main content
10.1145/1998582.1998603acmconferencesArticle/Chapter ViewAbstractPublication PagesicacConference Proceedingsconference-collections
research-article

A bayesian approach to online performance modeling for database appliances using gaussian models

Published:14 June 2011Publication History

ABSTRACT

In order to meet service level agreements (SLAs) and to maintain peak performance for database management systems (DBMS), database administrators (DBAs) need to implement policies for effective workload scheduling, admission control, and resource provisioning. Accurately predicting response times of DBMS queries is necessary for a DBA to effectively achieve these goals. This task is particularly challenging due to the fact that a database workload typically consists of many concurrently running queries and an accurate model needs to capture their interactions. Additional challenges are introduced when DBMSes are run in dynamic cloud computing environments, where workload, data, and physical resources can change frequently, on-the-fly. Building an efficient and highly accurate online DBMS performance model that is robust in the face of changing workloads, data evolution, and physical resource allocations is still an unsolved problem. In this work, our goal is to build such an online performance model for database appliances using an experiment-driven modeling approach. We use a Bayesian approach and build novel Gaussian models that take into account the interaction among concurrently executing queries and predict response times of individual DBMS queries. A key feature of our modeling approach is that the models can be updated online in response to new queries or data, or changing resource allocations. We experimentally demonstrate that our models are accurate and effective -- our best models have an average prediction error of 16.3% in the worst case.

References

  1. Amazon Elastic Compute Cloud (Amazon EC2). {online} http://aws.amazon.com/ec2/.Google ScholarGoogle Scholar
  2. Amazon Relational Database Service (Amazon RDS). {online} http://aws.amazon.com/rds/.Google ScholarGoogle Scholar
  3. Microsoft SQLAzure. {online} http://www.microsoft.com/en-us/sqlazure/default.aspx.Google ScholarGoogle Scholar
  4. M. Abouzour, K. Salem, and P. Bumbulis. Automatic tuning of the multiprogramming level in Sybase SQL Anywhere. In Workshop on Self-managing Database Systems (SMDB), 2010.Google ScholarGoogle ScholarCross RefCross Ref
  5. M. Ahmad, A. Aboulnaga, S. Babu, and K. Munagala. Interaction-aware scheduling of report generation workloads. VLDB Journal, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Ahmad, S. Duan, A. Aboulnaga, and S. Babu. Predicting completion times of batch query workloads using interaction-aware models and simulation. In Extending Database Technology (EDBT), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. E. B. Andersen. Asymptotic properties of conditional maximum-likelihood estimators. Journal of the Royal Statistical Society. Series B (Methodological), 32(2), 1970.Google ScholarGoogle ScholarCross RefCross Ref
  8. A. Ganapathi, H. A. Kuno, U. Dayal, J. L. Wiener, A. Fox, M. I. Jordan, and D. A. Patterson. Predicting multiple metrics for queries: Better decisions enabled by machine learning. In International Conference on Data Engineering (ICDE), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Gupta, A. Mehta, and U. Dayal. PQR : Predicting query execution times for autonomous workload management. In International Conference on Autonomic Computing (ICAC), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. W. W. Hager. Updating the inverse of a matrix. SIAM Review, 31, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. V. Markl, G. M. Lohman, and V. Raman. LEO: An autonomic query optimizer for DB2. IBM Systems Journal, 42(1), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Narayanan, E. Thereska, and A. Ailamaki. Continuous resource monitoring for self-predicting DBMS. In Symposium on Modeling,Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. O'Hagan. Some bayesian numerical analysis. Bayesian Statistics 4, 1992.Google ScholarGoogle Scholar
  15. C. Rasmussen and C. Williams. Gaussian Processes for Machine Learning. Cambridge: MIT Press, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. B. Sheikh, U. F. Minhas, O. Z. Khan, A. Aboulnaga, P. Poupart, and D. J. Taylor. A Bayesian approach to online performance modeling for database appliances using Gaussian models. Technical Report CS-2011-13, University of Waterloo, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. Thereska, D. Narayanan, and G. R. Ganger. Towards self-predicting systems: What if you could ask 'what-if'? Knowledge Eng. Review, 21(3), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. V. Thummala and S. Babu. iTuned: a tool for configuring and visualizing database parameters. In International Conference on Management of Data (SIGMOD), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Tozer, T. Brecht, and A. Aboulnaga. Q-Cop: Avoiding bad query mixes to minimize client timeouts under heavy loads. In International Conference on Data Engineering (ICDE), 2010.Google ScholarGoogle ScholarCross RefCross Ref
  20. The TPC-H Benchmark. {online} http://www.tpc.org/tpch/.Google ScholarGoogle Scholar
  21. B. J. Watson, M. Marwah, D. Gmach, Y. Chen, M. Arlitt, and Z. Wang. Probabilistic performance modeling of virtualized resource allocation. In International Conference on Autonomic Computing (ICAC), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. G. Weikum, A. Moenkeberg, C. Hasse, and P. Zabback. Self-tuning database technology and information services: from wishful thinking to viable engineering. In International Conference on Very Large Data Bases (VLDB), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Q. Zhang, L. Cherkasova, and E. Smirni. A regression-based analytic model for dynamic resource provisioning of multi-tier applications. In International Conference on Autonomic Computing (ICAC), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A bayesian approach to online performance modeling for database appliances using gaussian models

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICAC '11: Proceedings of the 8th ACM international conference on Autonomic computing
      June 2011
      278 pages
      ISBN:9781450306072
      DOI:10.1145/1998582

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 14 June 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader