skip to main content
10.1145/1989284.1989310acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
research-article

Parallel evaluation of conjunctive queries

Published:13 June 2011Publication History

ABSTRACT

The availability of large data centers with tens of thousands of servers has led to the popular adoption of massive parallelism for data analysis on large datasets. Several query languages exist for running queries on massively parallel architectures, some based on the MapReduce infrastructure, others using proprietary implementations. Motivated by this trend, this paper analyzes the parallel complexity of conjunctive queries. We propose a very simple model of parallel computation that captures these architectures, in which the complexity parameter is the number of parallel steps requiring synchronization of all servers. We study the complexity of conjunctive queries and give a complete characterization of the queries which can be computed in one parallel step. These form a strict subset of hierarchical queries, and include flat queries like R(x,y), S(x,z), T(x,v), U(x,w), tall queries like R(x), S(x,y), T(x,y,z), U(x,y,z,w), and combinations thereof, which we call tall-flat queries. We describe an algorithm for computing in parallel any tall-flat query, and prove that any query that is not tall-flat cannot be computed in one step in this model. Finally, we present extensions of our results to queries that are not tall-flat.

References

  1. F. N. Afrati and J. D. Ullman. Optimizing joins in a map-reduce environment. In EDBT, pages 99--110, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. Alvaro, W. Marczak, N. Conway, J. M. Hellerstein, D. Maier, and R. C. Sears. Dedalus: Datalog in time and space. Technical Report UCB/EECS-2009-173, EECS Department, University of California, Berkeley, Dec 2009.Google ScholarGoogle Scholar
  3. L. Carter and M. N. Wegman. Universal classes of hash functions. J. Comput. Syst. Sci., 18(2):143--154, 1979.Google ScholarGoogle ScholarCross RefCross Ref
  4. R. Chaiken, B. Jenkins, P.-A. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou. Scope: easy and efficient parallel processing of massive data sets. Proc. VLDB Endow., 1:1265--1276, August 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Cohen. Containment of aggregate queries. SIGMOD Record, 34(1):77--85, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. E. Culler, R. M. Karp, D. A. Patterson, A. Sahay, K. E. Schauser, E. E. Santos, R. Subramonian, and T. von Eicken. Logp: Towards a realistic model of parallel computation. In PPOPP, pages 1--12, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. N. N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. VLDB J., 16(4):523--544, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, pages 137--150, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. J. DeWitt and J. Gray. Parallel database systems: The future of high performance database systems. Commun. ACM, 35(6):85--98, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Gates, O. Natkovich, S. Chopra, P. Kamath, S. Narayanam, C. Olston, B. Reed, S. Srinivasan, and U. Srivastava. Building a highlevel dataflow system on top of mapreduce: The pig experience. PVLDB, 2(2):1414--1425, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Grohe, Y. Gurevich, D. Leinders, N. Schweikardt, J. Tyszkiewicz, and J. V. den Bussche. Database query processing using finite cursor machines. Theory Comput. Syst., 44(4):533--560, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. M. Hellerstein. The declarative imperative: experiences and conjectures in distributed logic. SIGMOD Rec., 39:5--19, September 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. N. Immerman. Expressibility and parallel complexity. SIAM J. Comput., 18(3):625--638, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H. J. Karloff, S. Suri, and S. Vassilvitskii. A model of computation for mapreduce. In SODA, pages 938--948, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Koutris and D. Suciu. Parallel evaluation of conjunctive queries. Research Report UW-CSE-11-03-01, University of Washington, 2011.Google ScholarGoogle Scholar
  16. L. Libkin. Elements of Finite Model Theory. Springer, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Melnik, A. Gubarev, J. J. Long, G. Romer, S. Shivakumar, M. Tolton, and T. Vassilakis. Dremel: Interactive analysis of web-scale datasets. PVLDB, 3(1):330--339, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Pagh and R. Pagh. Uniform hashing in constant time and optimal space. SIAM J. Comput., 38(1):85--96, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Raab and A. Steger. "balls into bins" - a simple and tight analysis. In RANDOM, pages 159--170, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. Sanders. On the competitive analysis of randomized static load balancing. In Proceedings of the first Workshop on Randomized Parallel Algorithms, RANDOM, 1996.Google ScholarGoogle Scholar
  21. L. J. Stockmeyer and U. Vishkin. Simulation of parallel random access machines by circuits. SIAM J. Comput., 13(2):409--422, 1984.Google ScholarGoogle ScholarCross RefCross Ref
  22. A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive - a warehousing solution over a map-reduce framework. PVLDB, 2(2):1626--1629, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. L. G. Valiant. A bridging model for parallel computation. Commun. ACM, 33(8):103--111, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Y. Xu, P. Kostamaa, X. Zhou, and L. Chen. Handling data skew in parallel joins in shared-nothing systems. In SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1043--1052, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Y. Yu, M. Isard, D. Fetterly, M. Budiu, Ú. Erlingsson, P. K. Gunda, and J. Currey. Dryadlinq: A system for general-purpose distributed data-parallel computing using a high-level language. In OSDI, pages 1--14, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Parallel evaluation of conjunctive queries

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      PODS '11: Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
      June 2011
      332 pages
      ISBN:9781450306607
      DOI:10.1145/1989284

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 June 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      PODS '11 Paper Acceptance Rate25of113submissions,22%Overall Acceptance Rate642of2,707submissions,24%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader