skip to main content
10.1145/1996109.1996115acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

Cumulus: an open source storage cloud for science

Published:08 June 2011Publication History

ABSTRACT

Amazon's S3 protocol has emerged as the de facto interface for storage in the commercial data cloud. However, it is closed source and unavailable to the numerous science data centers all over the country. Just as Amazon's Simple Storage Service (S3) provides reliable data cloud access to commercial users, scientific data centers must provide their users with a similar level of service. Ideally scientific data centers could allow the use of the same clients and protocols that have proven effective to Amazon's users. But how well does the S3 REST interface compare with the data cloud transfer services used in today's computational centers? Does it have the features needed to support the scientific community? If not, can it be extended to include these features without loss of compatibility? Can it scale and distribute resources equally when presented with common scientific the usage patterns?

We address these questions by presenting Cumulus, an open source implementation of the Amazon S3 REST API. It is packaged with the Nimbus IaaS toolkit and provides scalable and reliable access to scientific data. Its performance compares favorably with that of GridFTP and SCP, and we have added features necessary to support the econometrics important to the scientific community.

References

  1. Armbrust, M., et al. Above the Clouds: A Berkeley View of Cloud Computing. Tech. report EUB/EECS-2009-28, University of California at Berkeley. 2009.Google ScholarGoogle Scholar
  2. Iamnitchi, A., S. Doraimani, and G. Garzoglio. Filecules in High-Energy Physics: Characteristics and Impact on Resource Management. In High Performance Distributed Computing (HPDC). 2006.Google ScholarGoogle Scholar
  3. Ball, N.M., and D. Schade, Astroinformatics in Canada. White Paper, 2010.Google ScholarGoogle Scholar
  4. Amazon Simple Storage Service (Amazon S3): http://aws.amazon.com/s3/.Google ScholarGoogle Scholar
  5. Rackspace: http://www.rackspace.com/.Google ScholarGoogle Scholar
  6. Garfinkel, S., An Evaluation of Amazon's Grid Computing Services: EC2, S3, and SQS. 2007.Google ScholarGoogle Scholar
  7. Palankar, M., A. Iamnitchi, M. Ripeanu, and S. Garfinkel. Amazon S3 for Science Grids: A Viable Solution? In International Workshop on Data-Aware Distributed Computing. Boston, MA. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Schmuck, F., and R. Haskin. GPFS: A Shared-Disk File System for Large Computing Clusters. In 1st USENIX Conference on File and Storage Technologies (FAST '02). Berkeley, CA. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Carns, P. H., I. W. Ligon, R. Ross, and R. Thakur. PVFS: A Parallel File System For Linux Clusters. In 4th Annual Linux Showcase and Conference. Atlanta, GA. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Shvachko, K., H. Kuang, S. Radia, and R. Chansler. The Hadoop Distributed File System. In IEEE 26th Symposium on Mass Storage Systems and Technologies. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. The Nimbus Toolkit: www.nimbusproject.org.Google ScholarGoogle Scholar
  12. Keahey, K., I. Foster, T. Freeman, and X. Zhang. Virtual Workspaces: Achieving Quality of Service and Quality of Life in the Grid. Scientific Programming 13 (4):265--275. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Freeman, T., and K. Keahey, Flying Low: Simple Leases with Workspace Pilot. In EuroPar 2008, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Vogels, W., Eventually Consistent. ACM Queue, 2008. 6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Open Cloud Computing Interface (OCCI): http://occi-wg.org/.Google ScholarGoogle Scholar
  16. Sandberg, R., D. Goldberg, S. Kleiman, D. Walsh, and B. Lyon. Design and Implementation of the Sun Network Filesystem. In Proceedings of the Summer USENIX Conference. June 1985.Google ScholarGoogle Scholar
  17. Gu, Y., and R. Grossman. Sector and Sphere: the Design and Implementation of a High Performance Data Cloud. In CCA. 2008.Google ScholarGoogle Scholar
  18. Cassandra: http://cassandra.apache.org/.Google ScholarGoogle Scholar
  19. Twisted Matrix Labs: http://twistedmatrix.com/trac/wiki.Google ScholarGoogle Scholar
  20. SQLite Home page: http://sqlite.org/.Google ScholarGoogle Scholar
  21. PostgreSQL: http://www.postgresql.org/.Google ScholarGoogle Scholar
  22. FUSE: Filesystem in Userspace: http://fuse.sourceforge.net/.Google ScholarGoogle Scholar
  23. FutureGrid: www.futuregrid.org.Google ScholarGoogle Scholar
  24. Allcock, W., GridFTP: Protocol Extensions to FTP for the Grid. In Global Grid Forum. 2003.Google ScholarGoogle Scholar
  25. Bonnie Disk I/O Benchmark: http://www.textuality.com/bonnie/.Google ScholarGoogle Scholar
  26. Nurmi, D., R. Wolski, C. Grzegorczyk, G. Obertelli, S. Soman, L. Youseff, and D. Zagorodnov. The Eucalyptus Open-Source Cloud-Computing System. In CCGrid. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. OpenStack: The open source, open standards cloud: http://openstack.org/.Google ScholarGoogle Scholar
  28. The OpenNebula Project: http://www.opennebula.org/.Google ScholarGoogle Scholar
  29. Allcock, W., J. Bresnahan, R. Kettimuthu, M. Link, C. Dumitrescu, I. Raicu, and I. Foster. The Globus Striped GridFTP Framework and Server. In SC '05. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. s3cmd : command line S3 client: http://s3tools.org/s3cmd.Google ScholarGoogle Scholar
  31. boto: Python interface to Amazon Web Services: http://code.google.com/p/boto/.Google ScholarGoogle Scholar
  32. jets3t: An open source Java toolkit for Amazon S3 and CloudFront: http://jets3t.s3.amazonaws.com/.Google ScholarGoogle Scholar
  33. Bogdan Nicolae. High Throughput Data-Compression for Cloud Storage. Pages 1--12 in Proceedings of the Third International Conference on Data Management in Grid and Peer-to-Peer Systems (Globe'10). Abdelkader Hameurlain, Franck Morvan, and A. Min Tjoa (eds.). Springer-Verlag, Berlin. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Cumulus: an open source storage cloud for science

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ScienceCloud '11: Proceedings of the 2nd international workshop on Scientific cloud computing
      June 2011
      74 pages
      ISBN:9781450306997
      DOI:10.1145/1996109
      • General Chairs:
      • Ioan Raicu,
      • Pete Beckman,
      • Ian T. Foster,
      • Program Chair:
      • Yogesh Simmhan

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 8 June 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate44of151submissions,29%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader