skip to main content
research-article

Decoupling storage and computation in Hadoop with SuperDataNodes

Published:14 April 2010Publication History
Skip Abstract Section

Abstract

The rise of ad-hoc data-intensive computing has led to the development of data-parallel programming systems such as Map/Reduce and Hadoop, which achieve scalability by tightly coupling storage and computation. This can be limiting when the ratio of computation to storage is not known in advance, or changes over time. In this work, we examine decoupling storage and computation in Hadoop through SuperDataNodes, which are servers that contain an order of magnitude more disks than traditional Hadoop nodes. We found that SuperDataNodes are not only capable of supporting workloads with high storage-to-processing workloads, but in some cases can outperform traditional Hadoop deployments through better management of a large centralized pool of disks.

References

  1. Yahoo Developer Blog. http://developer.yahoo.net/blogs/hadoop/2009/05/hadoop_sorts_a_petabyte%_in_162.html.Google ScholarGoogle Scholar
  2. Hadoop Core. http://hadoop.apache.org/core.Google ScholarGoogle Scholar
  3. Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large clusters. In OSDI'04: Proceedings of the 6th conference on Symposium on Opearting Systems Design &; Implementation, Berkeley, CA, USA, 2004. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Amazon EC2 and S3. http://aws.amazon.com.Google ScholarGoogle Scholar
  5. Xiaobo Fan, Wolf-Dietrich Weber, and Luiz Andre Barroso. Power provisioning for a warehouse-sized computer. In ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture, pages 13--23, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Rodrigo Fonseca, George Porter, Randy H. Katz, Scott Shenker, and Ion Stoica. X-trace: A pervasive network tracing framework. In NSDI. USENIX Association, Cambridge, MA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jim Gray. Distributed computing economics. Queue, 6(3):63--68, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Rack Aware Placement JIRA Issue. http://issues.apache.org/jira/browse/HADOOP-692.Google ScholarGoogle Scholar
  9. Amazon Elastic Map/Reduce. http://aws.amazon.com/elasticmapreduce.Google ScholarGoogle Scholar
  10. The SAM/QFS Storage System. http://www.opensolaris.org/os/project/samqfs.Google ScholarGoogle Scholar
  11. Prof. Joseph M. Hellerstein DataBeta Blog. http://databeta.wordpress.com/2009/05/14/bigdata-node-density.Google ScholarGoogle Scholar
  12. Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, lfar Erlingsson, Pradeep Kumar Gunda, and Jon Currey. DryadLINQ: A system for generalpurpose distributed data-parallel computing using a high-level language. In Richard Draves and Robbert van Renesse, editors, OSDI, pages 1--14. USENIX Association, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Decoupling storage and computation in Hadoop with SuperDataNodes

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              • Published in

                cover image ACM SIGOPS Operating Systems Review
                ACM SIGOPS Operating Systems Review  Volume 44, Issue 2
                April 2010
                92 pages
                ISSN:0163-5980
                DOI:10.1145/1773912
                Issue’s Table of Contents

                Copyright © 2010 Author

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 14 April 2010

                Check for updates

                Qualifiers

                • research-article

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader