skip to main content
10.1145/2983323.2983773acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Cost-Effective Stream Join Algorithm on Cloud System

Authors Info & Claims
Published:24 October 2016Publication History

ABSTRACT

Matrix-based scheme (Join-Matrix) can prefectly support distributed stream joins, especially for arbitrary join predicates, because it guarantees any tuples from two streams to meet with each other. However,the dynamics and unpredictability features of stream require quick actions on scheme changing. Otherwise, they may lead to degradation of system throughputs and increament of processing latency with the waste of system resources, such as CPUs and Memories. Since Join-Matrix model has the fixed processing architecture with replicated data, these kinds of adverseness will be magnified. Therefore, it is urgent to find a solution that preserves advantages of Join-Matrix model and promises a good usage to computation resources when it meets scheme changing. In this paper, we propose a cost-effective stream join algorithm, which ensures the adaptability of Join-Matrix but with lower resources consumption. Specifically, a varietal matrix generation algorithm is proposed to generate an irregular matrix scheme for assigning the minimal number of tasks; a lightweight migration algorithm is designed to ensure state migration at a low cost; a complete load balance process framework is described to guarantee the correctness during the scheme changing. We conduct extensive experiments to compare our method with baseline systems on both benchmarks and real-workloads, and explain the results in detail.

References

  1. Apache Storm. http://storm.apache.org/.Google ScholarGoogle Scholar
  2. The TPC-H Benchmark. http://www.tpc.org/tpch.Google ScholarGoogle Scholar
  3. R. Ananthanarayanan, V. Basker, S. Das, and et al. Photon: fault-tolerant and scalable joining of continuous data streams. In SIGMOD, pages 577--588, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. Cormode and S. Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, 55(1):58--75, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J.-P. Dittrich, B. Seeger, and et al. Progressive merge join: A genetic and non-blocking sort-based join algorithm. In VLDB, pages 299--310, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Elseidy, A. Elguindy, A.and Vitorovic, and C. Koch. Scalable and adaptive online joins. In VLDB, pages 441--452, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. S. Epstein, M. Stonebraker, and E. Wong. Distributed query processing in a relational data base system. In SIGMOD, pages 169--180, 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Fang, R. Zhang, X. Wang, and Z. A. Flexible and adaptive stream join algorithm. In APWeb, Accepted, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  9. B. Gedik. Partitioning functions for stateful data parallelism in stream processing. VLDBJ, 23(4):517--539, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Huebsch, M. Garofalakis, J. Hellerstein, and I. Stoica. Advanced join strategies for large-scale distributed computation. In VLDB, pages 1484--1495, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Z. Ives, D. Florescu, M. Friedman, A. Levy, and D. Weld. An adaptive query execution system for data integration. In SIGMOD, pages 299--310, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Y. Kwon, M. Balazinska, and et al. Skewtune: mitigating skew in mapreduce applications. In SIGMOD, pages 25--36, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Q. Lin, B. C. Ooi, Z. Wang, and C. Yu. Scalable distributed stream join processing. In SIGMOD, pages 811--825, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. B. Liu, Y. Zhu, M. Jbantova, and et al. A dynamically adaptive distributed system for processing complex continuous queries. In VLDB, pages 1338--1341, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Mokbel, M. Lu, and W. Aref. Hash-merge join: Non-blocking join algorithm for producing fast and early join results. In ICDE, pages 251--263, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. A. U. Nasir, M. Serafini, and et al. When two choices are not enough: Balancing at scale in distributed stream processing. In ICDE, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  17. A. Okcan and M. Riedewald. Processing theta-joins using mapreduce. In SIGMOD, pages 949--960, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Z. Qian, Y. He, C. Su, and et al. Timestream: reliable stream computation in the cloud. In Eurosys, pages 1--14, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. W. Stamos and H. C. Young. A symmetric and replicate algorithm for distributed joins. IEEE Transactions on Parallel and Distributed Systems, 4(12):1345--1354, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. N. ufler, B.and Augsten, A. Reiser, and A. Kemper. Load balancing in mapreduce based on scalable cardinality estimates. In ICDE, pages 522--533, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. Urhan and M. Franklin. Dynamic pipeline scheduling for improving interactive query performance. In VLDB, pages 501--510, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Vitorovic, M. ElSeidy, and C. Koch. Load balancing and skew resilience for parallel joins. In ICDE, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  23. A. Wilschut and P. Apers. Dataflow query execution in a aarallel main-memory environment. Distributed and Parallel Databases, 1(1):103--128, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Y. Xing, J. Hwang, U. Cetintemel, and S. Zdonik. Providing resiliency to load variations in distributed stream processing. In VLDB, pages 775--786, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Y. Xu, P. Kostamaa, X. Zhou, and L. Chen. Handling data skew in parallel joins in shared-nothing systems. In SIGMOD, pages 1043--1052, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Zaharia, T. Das, and et al. Discretized streams: fault-tolerant streaming computation at scale. In SIGOPS, pages 423--438, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Cost-Effective Stream Join Algorithm on Cloud System

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management
            October 2016
            2566 pages
            ISBN:9781450340731
            DOI:10.1145/2983323

            Copyright © 2016 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 24 October 2016

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            CIKM '16 Paper Acceptance Rate160of701submissions,23%Overall Acceptance Rate1,861of8,427submissions,22%

            Upcoming Conference

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader