research-article

Incoop: MapReduce for incremental computations

Authors:
Pramod Bhatotia

Max Planck Institute for Software Systems (MPI-SWS)

Max Planck Institute for Software Systems (MPI-SWS)
View Profile

,
Alexander Wieder

Max Planck Institute for Software Systems (MPI-SWS)

Max Planck Institute for Software Systems (MPI-SWS)
View Profile

,
Rodrigo Rodrigues

Max Planck Institute for Software Systems (MPI-SWS)

Max Planck Institute for Software Systems (MPI-SWS)
View Profile

,
Umut A. Acar

Max Planck Institute for Software Systems (MPI-SWS)

Max Planck Institute for Software Systems (MPI-SWS)
View Profile

,
Rafael Pasquin

Universidade Federal de Uberlândia (FACOM/UFU)

Universidade Federal de Uberlândia (FACOM/UFU)
View Profile

SOCC '11: Proceedings of the 2nd ACM Symposium on Cloud ComputingOctober 2011Article No.: 7Pages 1–14https://doi.org/10.1145/2038916.2038923

Published:26 October 2011Publication History

SOCC '11: Proceedings of the 2nd ACM Symposium on Cloud Computing

Pages 1–14

ABSTRACT

Many online data sets evolve over time as new entries are slowly added and existing entries are deleted or modified. Taking advantage of this, systems for incremental bulk data processing, such as Google's Percolator, can achieve efficient updates. To achieve this efficiency, however, these systems lose compatibility with the simple programming models offered by non-incremental systems, e.g., MapReduce, and more importantly, requires the programmer to implement application-specific dynamic algorithms, ultimately increasing algorithm and code complexity.

In this paper, we describe the architecture, implementation, and evaluation of Incoop, a generic MapReduce framework for incremental computations. Incoop detects changes to the input and automatically updates the output by employing an efficient, fine-grained result reuse mechanism. To achieve efficiency without sacrificing transparency, we adopt recent advances in the area of programming languages to identify the shortcomings of task-level memoization approaches, and to address these shortcomings by using several novel techniques: a storage system, a contraction phase for Reduce tasks, and an affinity-based scheduling algorithm. We have implemented Incoop by extending the Hadoop framework, and evaluated it by considering several applications and case studies. Our results show significant performance improvements without changing a single line of application code.

References

U. A. Acar, G. E. Blelloch, M. Blume, R. Harper, and K. Tangwongsan. An experimental analysis of self-adjusting computation. ACM Trans. Programming Languages and Systems, 32(1):1--53, 2009. Google ScholarDigital Library
U. A. Acar, G. E. Blelloch, and R. Harper. Adaptive functional programming. ACM Trans. Programming Languages and Systems, 28(6):990--1034, 2006. Google ScholarDigital Library
P. Bhatotia, A. Wieder, I. E. Akkus, R. Rodrigues, and U. A. Acar. Large-scale incremental data processing with change propagation. In USENIX Workshop on Hot Topics in Cloud Computing (HotCloud'11). Google ScholarDigital Library
Y. Bu, B. Howe, M. Balazinska, and M. D. Ernst. HaLoop: Efficient iterative data processing on large clusters. In 36th International Conference on Very Large Data Bases, Singapore, September 14--16, 2010. Google ScholarDigital Library
S. Ceri and J. Widom. Deriving production rules for incremental view maintenance. In Proceedings of the 17th International Conference on Very Large Data Bases, pages 577--589, 1991. Google ScholarDigital Library
Y.-J. Chiang and R. Tamassia. Dynamic algorithms in computational geometry. Proceedings of the IEEE, 80(9):1412--1434, 1992.Google ScholarCross Ref
T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein, K. Elmeleegy, and R. Sears. Mapreduce online. In Proc. 7th Symposium on Networked systems design and implementation (NSDI'10). Google ScholarDigital Library
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. In Proc. 6th Symposium on Operating Systems Design and Implementation (OSDI'04). Google ScholarDigital Library
C. Demetrescu, I. Finocchi, and G. Italiano. Handbook on Data Structures and Applications. CRC, 2005.Google Scholar
P. K. Gunda, L. Ravindranath, C. A. Thekkath, Y. Yu, and L. Zhuang. Nectar: Automatic management of data and computation in data centers. In Proc. 9th Symp. Operating Systems Design and Implementation (OSDI'10). Google ScholarDigital Library
M. Hammer, U. A. Acar, M. Rajagopalan, and A. Ghuloum. A proposal for parallel self-adjusting computation. In DAMP '07: Proceedings of the first workshop on Declarative Aspects of Multicore Programming, 2007. Google ScholarDigital Library
M. A. Hammer, U. A. Acar, and Y. Chen. CEAL: A C-based language for self-adjusting computation. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implimentation, June 2009. Google ScholarDigital Library
B. He, M. Yang, Z. Guo, R. Chen, B. Su, W. Lin, and L. Zhou. Cornet: batched stream processing for data intensive distributed computing. In Proc. 1st Symposium on Cloud computing (SoCC'10). Google ScholarDigital Library
D. Logothetis, C. Olston, B. Reed, K. C. Webb, and K. Yocum. Stateful bulk processing for incremental analytics. In 1st Symp. on Cloud computing(SoCC'10). Google ScholarDigital Library
D. Logothetis, C. Trezzo, K. C. Webb, and K. Yocum. In-situ mapreduce for log processing. In Proceedings of the 2011 USENIX conference on USENIX annual technical conference, USENIXATC'11, 2011. Google ScholarDigital Library
A. Muthitacharoen, B. Chen, and D. Mazières. A low-bandwidth network file system. In Proc. 18th Symp. on Operating systems principles (SOSP'01). Google ScholarDigital Library
C. Olston and et.al. Nova: continuous pig/hadoop workflows. In Proceedings of the 2011 international conference on Management of data, SIGMOD, 2011. Google ScholarDigital Library
C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: a not-so-foreign language for data processing. In SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1099--1110, 2008. Google ScholarDigital Library
D. Peng and F. Dabek. Large-scale incremental processing using distributed transactions and notifications. In Proc. 9th Symposium on Operating Systems Design and Implementation (OSDI'10). Google ScholarDigital Library
L. Popa, M. Budiu, Y. Yu, and M. Isard. DryadInc: Reusing work in large-scale computations. In Worksh. on Hot Topics in Cloud Computing (HotCloud'09). Google ScholarDigital Library
G. Ramalingam and T. Reps. A Categorized Bibliography on Incremental Computation. In Proc. 20th Symp. Princ. of Progr. Languages (POPL'93). Google ScholarDigital Library
M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica. Improving mapreduce performance in heterogeneous environments. In Proceedings of the 8th USENIX conference on Operating systems design and implementation, OSDI'08, 2008. Google ScholarDigital Library

Index Terms

Incoop: MapReduce for incremental computations
1. Information systems
  1. Information retrieval
    1. Search engine architectures and scalability
      1. Distributed retrieval
      2. Peer-to-peer retrieval
  2. Information storage systems
    1. Storage architectures
      1. Distributed storage

Recommendations

IncApprox: A Data Analytics System for Incremental Approximate Computing
WWW '16: Proceedings of the 25th International Conference on World Wide Web

Incremental and approximate computations are increasingly being adopted for data analytics to achieve low-latency execution and efficient utilization of computing resources. Incremental computation updates the output incrementally instead of re-...
Read More
Incremental computation with names
OOPSLA '15

Over the past thirty years, there has been significant progress in developing general-purpose, language-based approaches to incremental computation, which aims to efficiently update the result of a computation when an input is changed. A key design ...
Read More
Incremental computation with names
OOPSLA 2015: Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications

Over the past thirty years, there has been significant progress in developing general-purpose, language-based approaches to incremental computation, which aims to efficiently update the result of a computation when an input is changed. A key design ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SOCC '11: Proceedings of the 2nd ACM Symposium on Cloud Computing
October 2011
377 pages
ISBN:9781450309769
DOI:10.1145/2038916
Program Chairs:
Jeffrey S. Chase
Duke University
,
Amr El Abbadi
Univ of California, Santa Barbara
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 October 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
memoization
self-adjusting computation
stability
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate169of722submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 230
  Total Citations
  View Citations
- 1,599
  Total Downloads
- Downloads (Last 12 months)27
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Incoop: MapReduce for incremental computations

SOCC '11: Proceedings of the 2nd ACM Symposium on Cloud Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

IncApprox: A Data Analytics System for Incremental Approximate Computing

Incremental computation with names

Incremental computation with names

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Incoop: MapReduce for incremental computations

SOCC '11: Proceedings of the 2nd ACM Symposium on Cloud Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

IncApprox: A Data Analytics System for Incremental Approximate Computing

Incremental computation with names

Incremental computation with names

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media