skip to main content
10.1145/2038916.2038923acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Incoop: MapReduce for incremental computations

Published:26 October 2011Publication History

ABSTRACT

Many online data sets evolve over time as new entries are slowly added and existing entries are deleted or modified. Taking advantage of this, systems for incremental bulk data processing, such as Google's Percolator, can achieve efficient updates. To achieve this efficiency, however, these systems lose compatibility with the simple programming models offered by non-incremental systems, e.g., MapReduce, and more importantly, requires the programmer to implement application-specific dynamic algorithms, ultimately increasing algorithm and code complexity.

In this paper, we describe the architecture, implementation, and evaluation of Incoop, a generic MapReduce framework for incremental computations. Incoop detects changes to the input and automatically updates the output by employing an efficient, fine-grained result reuse mechanism. To achieve efficiency without sacrificing transparency, we adopt recent advances in the area of programming languages to identify the shortcomings of task-level memoization approaches, and to address these shortcomings by using several novel techniques: a storage system, a contraction phase for Reduce tasks, and an affinity-based scheduling algorithm. We have implemented Incoop by extending the Hadoop framework, and evaluated it by considering several applications and case studies. Our results show significant performance improvements without changing a single line of application code.

References

  1. U. A. Acar, G. E. Blelloch, M. Blume, R. Harper, and K. Tangwongsan. An experimental analysis of self-adjusting computation. ACM Trans. Programming Languages and Systems, 32(1):1--53, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. U. A. Acar, G. E. Blelloch, and R. Harper. Adaptive functional programming. ACM Trans. Programming Languages and Systems, 28(6):990--1034, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Bhatotia, A. Wieder, I. E. Akkus, R. Rodrigues, and U. A. Acar. Large-scale incremental data processing with change propagation. In USENIX Workshop on Hot Topics in Cloud Computing (HotCloud'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Y. Bu, B. Howe, M. Balazinska, and M. D. Ernst. HaLoop: Efficient iterative data processing on large clusters. In 36th International Conference on Very Large Data Bases, Singapore, September 14--16, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Ceri and J. Widom. Deriving production rules for incremental view maintenance. In Proceedings of the 17th International Conference on Very Large Data Bases, pages 577--589, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Y.-J. Chiang and R. Tamassia. Dynamic algorithms in computational geometry. Proceedings of the IEEE, 80(9):1412--1434, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  7. T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein, K. Elmeleegy, and R. Sears. Mapreduce online. In Proc. 7th Symposium on Networked systems design and implementation (NSDI'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. In Proc. 6th Symposium on Operating Systems Design and Implementation (OSDI'04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Demetrescu, I. Finocchi, and G. Italiano. Handbook on Data Structures and Applications. CRC, 2005.Google ScholarGoogle Scholar
  10. P. K. Gunda, L. Ravindranath, C. A. Thekkath, Y. Yu, and L. Zhuang. Nectar: Automatic management of data and computation in data centers. In Proc. 9th Symp. Operating Systems Design and Implementation (OSDI'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Hammer, U. A. Acar, M. Rajagopalan, and A. Ghuloum. A proposal for parallel self-adjusting computation. In DAMP '07: Proceedings of the first workshop on Declarative Aspects of Multicore Programming, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. A. Hammer, U. A. Acar, and Y. Chen. CEAL: A C-based language for self-adjusting computation. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implimentation, June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. B. He, M. Yang, Z. Guo, R. Chen, B. Su, W. Lin, and L. Zhou. Cornet: batched stream processing for data intensive distributed computing. In Proc. 1st Symposium on Cloud computing (SoCC'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Logothetis, C. Olston, B. Reed, K. C. Webb, and K. Yocum. Stateful bulk processing for incremental analytics. In 1st Symp. on Cloud computing(SoCC'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Logothetis, C. Trezzo, K. C. Webb, and K. Yocum. In-situ mapreduce for log processing. In Proceedings of the 2011 USENIX conference on USENIX annual technical conference, USENIXATC'11, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Muthitacharoen, B. Chen, and D. Mazières. A low-bandwidth network file system. In Proc. 18th Symp. on Operating systems principles (SOSP'01). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Olston and et.al. Nova: continuous pig/hadoop workflows. In Proceedings of the 2011 international conference on Management of data, SIGMOD, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: a not-so-foreign language for data processing. In SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1099--1110, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Peng and F. Dabek. Large-scale incremental processing using distributed transactions and notifications. In Proc. 9th Symposium on Operating Systems Design and Implementation (OSDI'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. L. Popa, M. Budiu, Y. Yu, and M. Isard. DryadInc: Reusing work in large-scale computations. In Worksh. on Hot Topics in Cloud Computing (HotCloud'09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. G. Ramalingam and T. Reps. A Categorized Bibliography on Incremental Computation. In Proc. 20th Symp. Princ. of Progr. Languages (POPL'93). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica. Improving mapreduce performance in heterogeneous environments. In Proceedings of the 8th USENIX conference on Operating systems design and implementation, OSDI'08, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Incoop: MapReduce for incremental computations

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SOCC '11: Proceedings of the 2nd ACM Symposium on Cloud Computing
          October 2011
          377 pages
          ISBN:9781450309769
          DOI:10.1145/2038916

          Copyright © 2011 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 26 October 2011

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate169of722submissions,23%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader