skip to main content
10.1145/1458082.1458109acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Modeling and exploiting query interactions in database systems

Published: 26 October 2008 Publication History

Abstract

The typical workload in a database system consists of a mixture of multiple queries of different types, running concurrently and interacting with each other. Hence, optimizing performance requires reasoning about query mixes and their interactions, rather than considering individual queries or query types. In this paper, we show the significant impact that query interactions can have on workload performance. We present a new approach based on planning experiments and statistical modeling to capture the impact of query interactions. This approach requires no prior assumptions about the internal workings of the database system or the nature or cause of query interactions, making it portable across systems. As a concrete demonstration of the potential of capturing, modeling, and exploiting query interactions, we develop a novel interaction-aware query scheduler that targets report-generation workloads in Business Intelligence (BI) settings. Under certain assumptions, the schedule found by this scheduler is within a constant factor of optimal. An experimental evaluation with TPC-H queries on IBM DB2 demonstrates that our scheduler consistently outperforms (up to 4x) conventional schedulers that do not account for query interactions.

References

[1]
R. Abbott and H. Garcia-Molina. Scheduling real-time transactions. SIGMOD Rec., 17(1):71--81, 1988.
[2]
R. Abbott and H. Garcia-Molina. Scheduling real-time transactions with disk resident data. In VLDB, 1989.
[3]
R. K. Abbott and H. Garcia-Molina. Scheduling real-time transactions: a performance evaluation. TODS, 17(3):513--560, 1992.
[4]
M. Ahmad, A. Aboulnaga, S. Babu, and K. Munagala. Qshuffler: Getting the query mix right. In ICDE, 2008.
[5]
S. Aulbach, T. Grust, D. Jacobs, A. Kemper, and J. Rittinger. Multi-tenant databases for software as a service. In SIGMOD, 2008.
[6]
B. Babcock, S. Babu, M. Datar, R. Motwani, and D. Thomas. Operator scheduling in data stream systems. VLDB Journal, 13(4), 2004.
[7]
Business objects. http://www.businessobjects.com/.
[8]
M. J. Carey, R. Jauhari, and M. Livny. Priority in DBMS resource scheduling. In VLDB, 1989.
[9]
Cognos. http://www.cognos.com/.
[10]
R. H. Conway, W. L. Maxwell, and M. L. W. Theory of scheduling. Addison-Wesley, 1967.
[11]
CPLEX. http://www.ilog.com/products/cplex/.
[12]
S. Elnikety, E. Nahum, J. Tracey, and W. Zwaenepoel. A method for transparent admission control and request scheduling in e-commerce web sites. In WWW, 2004.
[13]
H.-U. Heiss and R. Wagner. Adaptive load control in transaction processing systems. In VLDB, 1991.
[14]
T. Ibaraki, T. Kameda, and N. Katoh. Cautious transaction schedulers for database concurrency control. TSE, 14(7), 1988.
[15]
N. Katoh, T. Ibaraki, and T. Kameda. Cautious transaction schedulers with admission control. TODS, 10(2):205--229, 1985.
[16]
T. Kelly. Detecting performance anomalies in global applications. In Proc. Workshop on Real, Large Distributed Systems, 2005.
[17]
J. Kyoung-Don Kang Son, S.H. Stankovic. Service differentiation in real-time main memory databases. In ISORC, 2002.
[18]
W.-Y. Loh. Regression trees with unbiased variable selection and interaction detection. Statistica Sinica, 12:361--386, 2002.
[19]
D. T. McWherter, B. Schroeder, A. Ailamaki, and M. Harchol-Balter. Priority mechanisms for oltp and transactional web applications. In ICDE, 2004.
[20]
D. T. McWherter, B. Schroeder, A. Ailamaki, and M. Harchol-Balter. Improving preemptive prioritization via statistical characterization of oltp locking. In ICDE, 2005.
[21]
A. Mehta, C. Gupta, and U. Dayal. BI Batch Manager: A system for managing batch workloads on enterprise data warehouses. In EDBT, 2008.
[22]
A. Mönkeberg and G. Weikum. Performance evaluation of an adaptive and robust load control method for the avoidance of data-contention thrashing. In VLDB, 1992.
[23]
K. O'Gorman, A. E. Abbadi, and D. Agrawal. Multiple query optimization in middleware using query teamwork. Software - Practice and Experience, 35(4), 2005.
[24]
H. Pang, M. J. Carey, and M. Livny. Multiclass query scheduling in real-time database systems. TKDE, 7(4):533--551, 1995.
[25]
P. Roy, S. Seshadri, S. Sudarshan, and S. Bhobe. Efficient and extensible algorithms for multi query optimization. In SIGMOD, 2008.
[26]
H. J. Ryser. Combinatorial Mathematics. The Mathematical Association of America, 1963.
[27]
G. M. Sacco and M. Schkolnick. Buffer management in relational database systems. TODS, 11(4):473--498, 1986.
[28]
A. Schrijver. Theory of Linear and Integer Programming. Wiley, 1998.
[29]
B. Schroeder and M. Harchol-Balter. Web servers under overload: How scheduling can help. TOIT, 6(1), 2006.
[30]
B. Schroeder, M. Harchol-Balter, A. Iyengar, E. Nahum, and A. Wierman. How to determine a good multi-programming level for external scheduling. In ICDE, 2006.
[31]
Skewed TPC-D data generator. ftp://ftp.research.microsoft.com/users/viveknar/TPCDSkew/.
[32]
C. Stewart, T. Kelly, and A. Zhang. Exploiting nonstationarity for performance prediction. In EuroSys, 2007.
[33]
M. Welsh and D. Culler. Adaptive overload control for busy internet servers. In USITS, 2003.
[34]
I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, second edition, June 2005.
[35]
Q. Zhang, L. Cherkasova, G. Mathews, W. Greene, and E. Smirni. R-capriccio: A capacity planning and anomaly detection tool for enterprise services with live workloads. In Middleware, 2007.
[36]
Q. Zhang, L. Cherkasova, and E. Smirni. A regression-based analytic model for dynamic resource provisioning of multi-tier applications. In ICAC, 2007.

Cited By

View all
  • (2022)Multi-Tenant Cloud Data Services: State-of-the-Art, Challenges and OpportunitiesProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3522566(2465-2473)Online publication date: 10-Jun-2022
  • (2022)LYRIC: Deadline and Budget Aware Spatio-Temporal Query Processing in CloudIEEE Transactions on Services Computing10.1109/TSC.2021.307300615:5(2869-2882)Online publication date: 1-Sep-2022
  • (2020)Reducing Tail Latency In Cassandra Cluster Using Regression Based Replica Selection Algorithm2020 IEEE Global Conference on Artificial Intelligence and Internet of Things (GCAIoT)10.1109/GCAIoT51063.2020.9345823(1-7)Online publication date: 12-Dec-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management
October 2008
1562 pages
ISBN:9781595939913
DOI:10.1145/1458082
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. database systems
  2. query prcoessing
  3. statistical techniques

Qualifiers

  • Research-article

Conference

CIKM08
CIKM08: Conference on Information and Knowledge Management
October 26 - 30, 2008
California, Napa Valley, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)1
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Multi-Tenant Cloud Data Services: State-of-the-Art, Challenges and OpportunitiesProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3522566(2465-2473)Online publication date: 10-Jun-2022
  • (2022)LYRIC: Deadline and Budget Aware Spatio-Temporal Query Processing in CloudIEEE Transactions on Services Computing10.1109/TSC.2021.307300615:5(2869-2882)Online publication date: 1-Sep-2022
  • (2020)Reducing Tail Latency In Cassandra Cluster Using Regression Based Replica Selection Algorithm2020 IEEE Global Conference on Artificial Intelligence and Internet of Things (GCAIoT)10.1109/GCAIoT51063.2020.9345823(1-7)Online publication date: 12-Dec-2020
  • (2019)A Hybrid Machine Learning Approach to Concurrent Query Performance Prediction2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)10.1109/ISKE47853.2019.9170460(1170-1177)Online publication date: Nov-2019
  • (2019)A QueryRating-Based Statistical Model for Predicting Concurrent Query Response TimeWeb Information Systems and Applications10.1007/978-3-030-30952-7_71(704-713)Online publication date: 16-Sep-2019
  • (2018)Gscheduler: A Query Scheduler Based on Query InteractionsWeb Information Systems and Applications10.1007/978-3-030-02934-0_36(395-403)Online publication date: 20-Nov-2018
  • (2017)Obtaining and Managing Answer Quality for Online Data-Intensive ServicesACM Transactions on Modeling and Performance Evaluation of Computing Systems10.1145/30552802:2(1-31)Online publication date: 26-Apr-2017
  • (2017)Distribution Based Workload Modelling of Continuous Queries in CloudsIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2016.25975465:1(120-133)Online publication date: 1-Jan-2017
  • (2017)Resource bricolage and resource selection for parallel database systemsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-016-0435-426:1(31-54)Online publication date: 1-Feb-2017
  • (2015)Performance Prediction for Concurrent Workloads in Distributed Database SystemsAlgorithms and Architectures for Parallel Processing10.1007/978-3-319-27140-8_43(626-639)Online publication date: 16-Dec-2015
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media