ABSTRACT
Enterprises may have multiple database systems spread across the organization for redundancy or for serving different applications. In such systems, query workloads can be distributed across different servers for better performance. A materialized view, or Materialized Query Table (MQT), is an auxiliary table with pre-computed data that can be used to significantly improve the performance of a database query. In this paper, we propose a framework for coordinating execution of OLAP query workloads across a database cluster with shared nothing architecture. Such coordination is complex since we need to consider (1) the time to build the MQTs, (2) the query execution impact of the MQTs, (3) whether the MQTs can fit in the disk space limitation, (4) server computation power, and (5) the effectiveness of the scheduling and placement algorithms in deriving a combination of configurations so that the workload can be completed in the shortest time period. We frame the problem as a combinatorial problem with a solution space that is exponential in the number of queries, MQTs, and servers. We provide a stochastic search heuristic that finds a near-optimal mapping of queries-to-servers and MQTs-to-servers within an arbitrarily bounded time and compare our solution with an exhaustive search and three standard greedy algorithms. Our search implementation produced schedules within 9% of the optimal found through an exhaustive search and produced better solutions than typical greedy algorithms for both TPC-H and synthetic benchmarks under a variety of experiments. For a key trial where disk space is limited, it produced 15% better results than the next best competitor, corresponding to an absolute wall clock advantage of over 10 hours.
- S. Agrawal, S. Chaudhuri, and V. Narasayya. "Automated selection of materialized views and indexes for SQL database," In Proceedings of VLDB. Google ScholarDigital Library
- S. Agrawal, E. Chu, and V. Narasayya. "Automatic physical design tuning: workload as a sequence," In Proceedings of SIGMOD, 2006. Google ScholarDigital Library
- J. Blythe, S. Jain, E. Deelman, Y. Gil, K. Vahi, A. Mandal, and K. Kennedy. "Task scheduling strategies for workflow-based applications in grids," In Proceedings of CCGrid, 2005. Google ScholarDigital Library
- T. Braun, et al. "A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems," Journal of Parallel and Distributed Computing, vol. 61, no. 6, June 2001. Google ScholarDigital Library
- J. Casanova, A. Legrand, D. Zagorodnow, and F. Berman. "Heuristics for scheduling parameter sweep applications in grid environments," In Proceedings of the Heterogeneous Computing Workshop, 2000. Google ScholarDigital Library
- A. Chakrabarti, et al. "Integration of Scheduling and Replication in Data Grids." In Proceedings of the International Conference on High Performance Computing, 2004. Google ScholarDigital Library
- S. Chaudhuri, R. Krishnamurthy, S. Potmianos, and K. Shim. "Optimizing Queries with Materialized Views," In Proceedings of ICDE, 1995. Google ScholarDigital Library
- P. Chen. "Optimal file allocation in multilevel storage systems." In Proceedings of AFIPS, 1973.Google Scholar
- L. Davis. "Job Shop Scheduling with Genetic Algorithms," In Proceedings of the International Conference on Genetic Algorithms, 1985. Google ScholarDigital Library
- L. Dowdy and D. Foster. "Comparative Models of the File Assignment Problem." ACM Computing Surveys, vol 14, no. 2, 1982. Google ScholarDigital Library
- A. Eiben and J. Smith. Introduction to Evolutionary Computing, Springer 2007. Google ScholarDigital Library
- D. Foster, L. Dowdy, and J. Ames. "File assignment in a computer network." Computer Networks 5, Sept. 1981.Google Scholar
- D. Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning. Kluwer Academic, 1989. Google ScholarDigital Library
- J. Holland. Adaptation in natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. MIT Press, 1992. Google ScholarDigital Library
- P. Hughes and G. Moe. "A structural approach to computer performance analysis." In Proceedings of AFIPS, 1973.Google Scholar
- IBM DB2 9, www.ibm.com/software/data/db2/9/Google Scholar
- H. Jiang, D. Gao, W.-S. Li. "Exploiting Correlation and Parallelism for Materialized-View Recommendation in Distributed Data Warehouses," In Proceedings of ICDE, 2007.Google Scholar
- P. Larson and H. Yang. "Computing Queries From Derived Relations," In Proceedings of VLDB, 1985. Google ScholarDigital Library
- W.-S. Li, V. Batra, V. Raman, W. Han, K. Candan, and I. Narang. "Load and Network Aware Query Routing for Information Integration," In Proceedings of ICDE, 2005. Google ScholarDigital Library
- Z. Michaelwicz and D. Fogel. How to Solve It: Modern Heuristics, Springer 2004. Google ScholarDigital Library
- Microsoft SQL Server, www.microsoft.com/sq1/default.mspxGoogle Scholar
- H. Mohamed and D. Epema. "An evaluation of the Close-to-Files Processor and Data Co-Allocation Policy in Multiclusers." In Proceedings of the IEEE International Conference on Cluster Computing. Google ScholarDigital Library
- MySQL Cluster www.mysql.com/products/database/cluster/.Google Scholar
- Oracle, www.oracle.comGoogle Scholar
- Oracle 11g Real Application Clusters, www.oracle.com/technology/products/database/clustering/index.htmlGoogle Scholar
- K. Ranganathan and I. Foster. "Computation Scheduling and Data Replication Algorithms for Data Grids." Grid Resource Management: State of the Art and Future Trends. Kluwer Academic Publishers, 2003. Google ScholarDigital Library
- J. Rao, C. Zhang, N. Megiddo, and G. Lohman. "Automating physical database design in a parallel database," In Proceedings of SIGMOD, 2002. Google ScholarDigital Library
- Redbrick, www.informix.comGoogle Scholar
- E. Rudensteiner, A. Koeller, and X. Zhang. "Maintaining Data Warehouses over Changing Information Sources," Communications of the ACM, vol. 43, no. 6, 2000. Google ScholarDigital Library
- K. Salem, K. Beyer, R. Cochrane, and B. Lindsay. "How to roll a join: Asynchronous Incremental View Maintenance," In Proceedings of SIGMOD, 2000. Google ScholarDigital Library
- E. Santos-Neto, W. Cirne, F. Brasileiro, and A. Lima. "Exploiting Replications and Data Reuse to Efficiently Schedule Data-Intensive Applications on Grids." In Proceedings of the 10th Workshop on Job Scheduling Strategies for Parallel Processing, 2004. Google ScholarDigital Library
- E. Schmueli and D. Feitelson. "Backfilling with lookahed to optimize the packing of parallel jobs," Springer-Verlag Lecture Notes in Computer Science, vol 2862, 2003.Google Scholar
- A. Venugopal, R. Buyya, and K. Ramamohanarao. "A taxonomy of Data Grids for distributed data sharing, management, and processing." ACM Computing Surveys, vol 31, no. 1, 2006. Google ScholarDigital Library
- D. Zilio, et al. "Recommending Materialized Views and Indexes with IBM DB2 Design Advisor," In Proceedings of ICAC, 2004. Google ScholarDigital Library
- D. Zilio, J. Rao, S. Lightstone, G. Lohman, A. Storm, C. Garcia-Arellano, and S. Fadden. "DB2 Design Advisor: Integrated Automatic Physical Database Design," In Proceedings of VLDB, 2004. Google ScholarDigital Library
Index Terms
Load distribution of analytical query workloads for database cluster architectures
Recommendations
Dynamic Materialization of Query Views for Data Warehouse Workloads
ICDE '08: Proceedings of the 2008 IEEE 24th International Conference on Data EngineeringA materialized view, or Materialized Query Table (MQT), is an auxiliary table with precomputed data that can be used to significantly improve the performance of a database query. Previous research efforts have focused onfinding the best candidate MQT ...
Improvement Algorithms for Semijoin Query Processing Programs in Distributed Database Systems
The problem of optimal query processing in distributed database systems was shown to be NP-hard. This means that heuristic algorithms are necessary to solve the query processing problem. In this paper, we describe algorithms to improve the solutions ...
Comments