skip to main content
10.1145/1353343.1353367acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article
Free Access

Load distribution of analytical query workloads for database cluster architectures

Published:25 March 2008Publication History

ABSTRACT

Enterprises may have multiple database systems spread across the organization for redundancy or for serving different applications. In such systems, query workloads can be distributed across different servers for better performance. A materialized view, or Materialized Query Table (MQT), is an auxiliary table with pre-computed data that can be used to significantly improve the performance of a database query. In this paper, we propose a framework for coordinating execution of OLAP query workloads across a database cluster with shared nothing architecture. Such coordination is complex since we need to consider (1) the time to build the MQTs, (2) the query execution impact of the MQTs, (3) whether the MQTs can fit in the disk space limitation, (4) server computation power, and (5) the effectiveness of the scheduling and placement algorithms in deriving a combination of configurations so that the workload can be completed in the shortest time period. We frame the problem as a combinatorial problem with a solution space that is exponential in the number of queries, MQTs, and servers. We provide a stochastic search heuristic that finds a near-optimal mapping of queries-to-servers and MQTs-to-servers within an arbitrarily bounded time and compare our solution with an exhaustive search and three standard greedy algorithms. Our search implementation produced schedules within 9% of the optimal found through an exhaustive search and produced better solutions than typical greedy algorithms for both TPC-H and synthetic benchmarks under a variety of experiments. For a key trial where disk space is limited, it produced 15% better results than the next best competitor, corresponding to an absolute wall clock advantage of over 10 hours.

References

  1. S. Agrawal, S. Chaudhuri, and V. Narasayya. "Automated selection of materialized views and indexes for SQL database," In Proceedings of VLDB. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Agrawal, E. Chu, and V. Narasayya. "Automatic physical design tuning: workload as a sequence," In Proceedings of SIGMOD, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Blythe, S. Jain, E. Deelman, Y. Gil, K. Vahi, A. Mandal, and K. Kennedy. "Task scheduling strategies for workflow-based applications in grids," In Proceedings of CCGrid, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. Braun, et al. "A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems," Journal of Parallel and Distributed Computing, vol. 61, no. 6, June 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Casanova, A. Legrand, D. Zagorodnow, and F. Berman. "Heuristics for scheduling parameter sweep applications in grid environments," In Proceedings of the Heterogeneous Computing Workshop, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Chakrabarti, et al. "Integration of Scheduling and Replication in Data Grids." In Proceedings of the International Conference on High Performance Computing, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Chaudhuri, R. Krishnamurthy, S. Potmianos, and K. Shim. "Optimizing Queries with Materialized Views," In Proceedings of ICDE, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. Chen. "Optimal file allocation in multilevel storage systems." In Proceedings of AFIPS, 1973.Google ScholarGoogle Scholar
  9. L. Davis. "Job Shop Scheduling with Genetic Algorithms," In Proceedings of the International Conference on Genetic Algorithms, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. Dowdy and D. Foster. "Comparative Models of the File Assignment Problem." ACM Computing Surveys, vol 14, no. 2, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Eiben and J. Smith. Introduction to Evolutionary Computing, Springer 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Foster, L. Dowdy, and J. Ames. "File assignment in a computer network." Computer Networks 5, Sept. 1981.Google ScholarGoogle Scholar
  13. D. Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning. Kluwer Academic, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Holland. Adaptation in natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. MIT Press, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Hughes and G. Moe. "A structural approach to computer performance analysis." In Proceedings of AFIPS, 1973.Google ScholarGoogle Scholar
  16. IBM DB2 9, www.ibm.com/software/data/db2/9/Google ScholarGoogle Scholar
  17. H. Jiang, D. Gao, W.-S. Li. "Exploiting Correlation and Parallelism for Materialized-View Recommendation in Distributed Data Warehouses," In Proceedings of ICDE, 2007.Google ScholarGoogle Scholar
  18. P. Larson and H. Yang. "Computing Queries From Derived Relations," In Proceedings of VLDB, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. W.-S. Li, V. Batra, V. Raman, W. Han, K. Candan, and I. Narang. "Load and Network Aware Query Routing for Information Integration," In Proceedings of ICDE, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Z. Michaelwicz and D. Fogel. How to Solve It: Modern Heuristics, Springer 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Microsoft SQL Server, www.microsoft.com/sq1/default.mspxGoogle ScholarGoogle Scholar
  22. H. Mohamed and D. Epema. "An evaluation of the Close-to-Files Processor and Data Co-Allocation Policy in Multiclusers." In Proceedings of the IEEE International Conference on Cluster Computing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. MySQL Cluster www.mysql.com/products/database/cluster/.Google ScholarGoogle Scholar
  24. Oracle, www.oracle.comGoogle ScholarGoogle Scholar
  25. Oracle 11g Real Application Clusters, www.oracle.com/technology/products/database/clustering/index.htmlGoogle ScholarGoogle Scholar
  26. K. Ranganathan and I. Foster. "Computation Scheduling and Data Replication Algorithms for Data Grids." Grid Resource Management: State of the Art and Future Trends. Kluwer Academic Publishers, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Rao, C. Zhang, N. Megiddo, and G. Lohman. "Automating physical database design in a parallel database," In Proceedings of SIGMOD, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Redbrick, www.informix.comGoogle ScholarGoogle Scholar
  29. E. Rudensteiner, A. Koeller, and X. Zhang. "Maintaining Data Warehouses over Changing Information Sources," Communications of the ACM, vol. 43, no. 6, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. K. Salem, K. Beyer, R. Cochrane, and B. Lindsay. "How to roll a join: Asynchronous Incremental View Maintenance," In Proceedings of SIGMOD, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. E. Santos-Neto, W. Cirne, F. Brasileiro, and A. Lima. "Exploiting Replications and Data Reuse to Efficiently Schedule Data-Intensive Applications on Grids." In Proceedings of the 10th Workshop on Job Scheduling Strategies for Parallel Processing, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. E. Schmueli and D. Feitelson. "Backfilling with lookahed to optimize the packing of parallel jobs," Springer-Verlag Lecture Notes in Computer Science, vol 2862, 2003.Google ScholarGoogle Scholar
  33. A. Venugopal, R. Buyya, and K. Ramamohanarao. "A taxonomy of Data Grids for distributed data sharing, management, and processing." ACM Computing Surveys, vol 31, no. 1, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. D. Zilio, et al. "Recommending Materialized Views and Indexes with IBM DB2 Design Advisor," In Proceedings of ICAC, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. D. Zilio, J. Rao, S. Lightstone, G. Lohman, A. Storm, C. Garcia-Arellano, and S. Fadden. "DB2 Design Advisor: Integrated Automatic Physical Database Design," In Proceedings of VLDB, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Load distribution of analytical query workloads for database cluster architectures

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          EDBT '08: Proceedings of the 11th international conference on Extending database technology: Advances in database technology
          March 2008
          762 pages
          ISBN:9781595939265
          DOI:10.1145/1353343

          Copyright © 2008 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 March 2008

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate7of10submissions,70%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader