research-article

Decoupling storage and computation in Hadoop with SuperDataNodes

Author:
George Porter

UC San Diego, La Jolla, CA

UC San Diego, La Jolla, CA
View Profile

Authors Info & Claims

ACM SIGOPS Operating Systems Review Volume 44 Issue 2April 2010pp 41–46https://doi.org/10.1145/1773912.1773923

Published:14 April 2010Publication History

ACM SIGOPS Operating Systems Review

Abstract

The rise of ad-hoc data-intensive computing has led to the development of data-parallel programming systems such as Map/Reduce and Hadoop, which achieve scalability by tightly coupling storage and computation. This can be limiting when the ratio of computation to storage is not known in advance, or changes over time. In this work, we examine decoupling storage and computation in Hadoop through SuperDataNodes, which are servers that contain an order of magnitude more disks than traditional Hadoop nodes. We found that SuperDataNodes are not only capable of supporting workloads with high storage-to-processing workloads, but in some cases can outperform traditional Hadoop deployments through better management of a large centralized pool of disks.

References

Yahoo Developer Blog. http://developer.yahoo.net/blogs/hadoop/2009/05/hadoop_sorts_a_petabyte%_in_162.html.Google Scholar
Hadoop Core. http://hadoop.apache.org/core.Google Scholar
Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large clusters. In OSDI'04: Proceedings of the 6th conference on Symposium on Opearting Systems Design &; Implementation, Berkeley, CA, USA, 2004. USENIX Association. Google ScholarDigital Library
Amazon EC2 and S3. http://aws.amazon.com.Google Scholar
Xiaobo Fan, Wolf-Dietrich Weber, and Luiz Andre Barroso. Power provisioning for a warehouse-sized computer. In ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture, pages 13--23, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
Rodrigo Fonseca, George Porter, Randy H. Katz, Scott Shenker, and Ion Stoica. X-trace: A pervasive network tracing framework. In NSDI. USENIX Association, Cambridge, MA, 2007. Google ScholarDigital Library
Jim Gray. Distributed computing economics. Queue, 6(3):63--68, 2008. Google ScholarDigital Library
Rack Aware Placement JIRA Issue. http://issues.apache.org/jira/browse/HADOOP-692.Google Scholar
Amazon Elastic Map/Reduce. http://aws.amazon.com/elasticmapreduce.Google Scholar
The SAM/QFS Storage System. http://www.opensolaris.org/os/project/samqfs.Google Scholar
Prof. Joseph M. Hellerstein DataBeta Blog. http://databeta.wordpress.com/2009/05/14/bigdata-node-density.Google Scholar
Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, lfar Erlingsson, Pradeep Kumar Gunda, and Jon Currey. DryadLINQ: A system for generalpurpose distributed data-parallel computing using a high-level language. In Richard Draves and Robbert van Renesse, editors, OSDI, pages 1--14. USENIX Association, 2008. Google ScholarDigital Library

Index Terms

Decoupling storage and computation in Hadoop with SuperDataNodes

Recommendations

Using the Gfarm File System as a POSIX Compatible Storage Platform for Hadoop MapReduce Applications
GRID '11: Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing

MapReduce is a promising parallel programming model for processing large data sets. Hadoop is an up-and-coming open-source implementation of MapReduce. It uses the Hadoop Distributed File System (HDFS) to store input and output data. Due to a lack of ...
Read More
Optimization strategy of Hadoop small file storage for big data in healthcare

As the era of "big data" comes, the data processing platform like Hadoop was born at the right moment. But its carrier for storage, Hadoop distributed file system (HDFS) has the great weakness in storage of the numerous small files. The storage of ...
Read More
Optimizing the Hadoop MapReduce Framework with high-performance storage devices

Solid-state drives (SSDs) are an attractive alternative to hard disk drives (HDDs) to accelerate the Hadoop MapReduce Framework. However, the SSD characteristics and today's Hadoop framework exhibit mismatches that impede indiscriminate SSD integration. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM SIGOPS Operating Systems Review Volume 44, Issue 2
April 2010
92 pages
ISSN:0163-5980
DOI:10.1145/1773912
Issue’s Table of Contents

Copyright © 2010 Author
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 April 2010
Check for updates
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 627
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Decoupling storage and computation in Hadoop with SuperDataNodes

ACM SIGOPS Operating Systems Review

Abstract

References

Cited By

Index Terms

Recommendations

Using the Gfarm File System as a POSIX Compatible Storage Platform for Hadoop MapReduce Applications

Optimization strategy of Hadoop small file storage for big data in healthcare

Optimizing the Hadoop MapReduce Framework with high-performance storage devices

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Decoupling storage and computation in Hadoop with SuperDataNodes

ACM SIGOPS Operating Systems Review

Abstract

References

Cited By

Index Terms

Recommendations

Using the Gfarm File System as a POSIX Compatible Storage Platform for Hadoop MapReduce Applications

Optimization strategy of Hadoop small file storage for big data in healthcare

Optimizing the Hadoop MapReduce Framework with high-performance storage devices

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media