research-article

Nobody ever got fired for using Hadoop on a cluster

Authors:

Andrew DouglasAuthors Info & Claims

HotCDP '12: Proceedings of the 1st International Workshop on Hot Topics in Cloud Data Processing

Article No.: 2, Pages 1 - 5

https://doi.org/10.1145/2169090.2169092

Published: 10 April 2012 Publication History

Get Access

Abstract

The norm for data analytics is now to run them on commodity clusters with MapReduce-like abstractions. One only needs to read the popular blogs to see the evidence of this. We believe that we could now say that "nobody ever got fired for using Hadoop on a cluster"!

We completely agree that Hadoop on a cluster is the right solution for jobs where the input data is multi-terabyte or larger. However, in this position paper we ask if this is the right path for general purpose data analytics? Evidence suggests that many MapReduce-like jobs process relatively small input data sets (less than 14 GB). Memory has reached a GB/$ ratio such that it is now technically and financially feasible to have servers with 100s GB of DRAM. We therefore ask, should we be scaling by using single machines with very large memories rather than clusters? We conjecture that, in terms of hardware and programmer time, this may be a better option for the majority of data processing jobs.

References

[1]

Apache Hadoop. http://hadoop.apache.org/.

Google Scholar

[2]

R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In VLDB, pages 487--499, 1994.

Digital Library

Google Scholar

[3]

G. Ananthanarayanan, A. Ghodsi, AndrewWang, D. Borthakur, S. Kandula, S. Shenker, and I. Stoica. PACMan: Coordinated memory caching for parallel jobs. In NSDI, Apr. 2012.

Digital Library

Google Scholar

[4]

P. Costa, A. Donnelly, G. O'Shea, and A. Rowstron. CamCube: A Key-based Data Center. Technical Report MSR TR-2010-74, 2010.

Google Scholar

[5]

P. Costa, A. Donnelly, A. Rowstron, and G. O'Shea. Camdoop: Exploiting in-network aggregation for big data applications. In NSDI, Apr. 2012.

Digital Library

Google Scholar

[6]

J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In OSDI, 2004.

Digital Library

Google Scholar

[7]

K. Elmeleegy. Piranha: Optimizing short jobs in hadoop. Under submission, 2012.

Google Scholar

[8]

T. Graepel, J. Q. Candela, T. Borchert, and R. Herbrich. Web-scale bayesian click-through rate prediction for sponsored search advertising in Microsofts Bing search engine. In 27th ICML, June 2010.

Google Scholar

[9]

A. Savasere, E. Omiecinski, and S. B. Navathe. An efficient algorithm for mining association rules in large databases. In VLDB, pages 432--444, 1995.

Digital Library

Google Scholar

Cited By

View all

Nikitopoulou DMasouros DXydis SSoudris D(2021)Performance Analysis and Auto-tuning for SPARK in-memory analytics2021 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE51398.2021.9474122(76-81)Online publication date: 1-Feb-2021
https://doi.org/10.23919/DATE51398.2021.9474122
Nasir MAslay CMorales GRiondato M(2021)TipTap: Approximate Mining of Frequent k-Subgraph Patterns in Evolving GraphsACM Transactions on Knowledge Discovery from Data10.1145/344259015:3(1-35)Online publication date: 21-Apr-2021
https://dl.acm.org/doi/10.1145/3442590
Luo LLiu YYang HQian D(2021)Magas: matrix-based asynchronous graph analytics on shared memory systemsThe Journal of Supercomputing10.1007/s11227-021-04091-x78:4(5650-5680)Online publication date: 1-Oct-2021
https://doi.org/10.1007/s11227-021-04091-x
Show More Cited By

Index Terms

Nobody ever got fired for using Hadoop on a cluster
1. Software and its engineering
  1. Software organization and properties
    1. Software system structures
      1. Distributed systems organizing principles
        Organizing principles for web applications

Recommendations

High Performance and Fault Tolerant Distributed File System for Big Data Storage and Processing Using Hadoop
ICICA '14: Proceedings of the 2014 International Conference on Intelligent Computing Applications

Hadoop is a quickly budding ecosystem of components based on Google's MapReduce algorithm and file system work for implementing MapReduce algorithms in a scalable fashion and distributed on commodity hardware. Hadoop enables users to store and process ...
Survey on improving the performance of MapReduce in Hadoop
NISS '21: Proceedings of the 4th International Conference on Networking, Information Systems & Security

Hadoop has become the most popular and the most used platform in distributed data processing, Hadoop is also an open-source software that implements the MapReduce model for processing big data, it has taken a large part in scientific research in the ...
Crime Data Analysis Using Pig with Hadoop

Big data is the voluminous and complex collection of data that comes from different sources such as sensors, content posted on social media website, sale purchase transaction etc. Such voluminous data becomes tough to process using ancient processing ...

Comments

Information & Contributors

Information

Published In

HotCDP '12: Proceedings of the 1st International Workshop on Hot Topics in Cloud Data Processing

April 2012

26 pages

ISBN:9781450311625

DOI:10.1145/2169090

Program Chairs:
Christof Fetzer
TU Dresden
,
Flavio Junqueira
Yahoo! Research
,
Peter Pietzuch
Imperial College London

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 April 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

EuroSys '12

Sponsor:

SIGOPS

EuroSys '12: Seventh EuroSys Conference 2012

April 10, 2012

Bern, Switzerland

Upcoming Conference

EuroSys '25

Sponsor:
sigops

Twentieth European Conference on Computer Systems

March 30 - April 3, 2025

Rotterdam , Netherlands

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

51
Total Citations
View Citations
797
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)10

Reflects downloads up to 22 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Nikitopoulou DMasouros DXydis SSoudris D(2021)Performance Analysis and Auto-tuning for SPARK in-memory analytics2021 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE51398.2021.9474122(76-81)Online publication date: 1-Feb-2021
https://doi.org/10.23919/DATE51398.2021.9474122
Nasir MAslay CMorales GRiondato M(2021)TipTap: Approximate Mining of Frequent k-Subgraph Patterns in Evolving GraphsACM Transactions on Knowledge Discovery from Data10.1145/344259015:3(1-35)Online publication date: 21-Apr-2021
https://dl.acm.org/doi/10.1145/3442590
Luo LLiu YYang HQian D(2021)Magas: matrix-based asynchronous graph analytics on shared memory systemsThe Journal of Supercomputing10.1007/s11227-021-04091-x78:4(5650-5680)Online publication date: 1-Oct-2021
https://doi.org/10.1007/s11227-021-04091-x
Iliakis KXydis SSoudris D(2020)Resource-Aware MapReduce Runtime for Multi/Many-core Architectures2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE48585.2020.9116281(897-902)Online publication date: Mar-2020
https://doi.org/10.23919/DATE48585.2020.9116281
Panda M(2019)On the Effectiveness of Hybrid Canopy With Hoeffding Adaptive Naive Bayes TreesWeb Services10.4018/978-1-5225-7501-6.ch043(788-802)Online publication date: 2019
https://doi.org/10.4018/978-1-5225-7501-6.ch043
Essertel GTahboub RDecker JBrown KOlukotun KRompf TArpaci-Dusseau AVoelker G(2018)FlareProceedings of the 13th USENIX conference on Operating Systems Design and Implementation10.5555/3291168.3291227(799-815)Online publication date: 8-Oct-2018
https://dl.acm.org/doi/10.5555/3291168.3291227
Lissandrini MBrugnara MVelegrakis Y(2018)Beyond macrobenchmarksProceedings of the VLDB Endowment10.14778/3297753.329775912:4(390-403)Online publication date: 1-Dec-2018
https://dl.acm.org/doi/10.14778/3297753.3297759
Heidari SSimmhan YCalheiros RBuyya R(2018)Scalable Graph Processing FrameworksACM Computing Surveys10.1145/319952351:3(1-53)Online publication date: 12-Jun-2018
https://dl.acm.org/doi/10.1145/3199523
Ciritoglu HBatista de Almeida LCunha de Almeida EBuda TMurphy JThorpe CWolter KKnottenbelt Wvan Hoorn ANambiar MKoziolek H(2018)Investigation of Replication Factor for Performance Enhancement in the Hadoop Distributed File SystemCompanion of the 2018 ACM/SPEC International Conference on Performance Engineering10.1145/3185768.3186359(135-140)Online publication date: 2-Apr-2018
https://dl.acm.org/doi/10.1145/3185768.3186359
Voinea MUta AIosup A(2018)POSUM: A Portfolio Scheduler for MapReduce Workloads2018 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2018.8622215(351-357)Online publication date: Dec-2018
https://doi.org/10.1109/BigData.2018.8622215
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

High Performance and Fault Tolerant Distributed File System for Big Data Storage and Processing Using Hadoop

Survey on improving the performance of MapReduce in Hadoop

Crime Data Analysis Using Pig with Hadoop

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations