ABSTRACT
Amazon's S3 protocol has emerged as the de facto interface for storage in the commercial data cloud. However, it is closed source and unavailable to the numerous science data centers all over the country. Just as Amazon's Simple Storage Service (S3) provides reliable data cloud access to commercial users, scientific data centers must provide their users with a similar level of service. Ideally scientific data centers could allow the use of the same clients and protocols that have proven effective to Amazon's users. But how well does the S3 REST interface compare with the data cloud transfer services used in today's computational centers? Does it have the features needed to support the scientific community? If not, can it be extended to include these features without loss of compatibility? Can it scale and distribute resources equally when presented with common scientific the usage patterns?
We address these questions by presenting Cumulus, an open source implementation of the Amazon S3 REST API. It is packaged with the Nimbus IaaS toolkit and provides scalable and reliable access to scientific data. Its performance compares favorably with that of GridFTP and SCP, and we have added features necessary to support the econometrics important to the scientific community.
- Armbrust, M., et al. Above the Clouds: A Berkeley View of Cloud Computing. Tech. report EUB/EECS-2009-28, University of California at Berkeley. 2009.Google Scholar
- Iamnitchi, A., S. Doraimani, and G. Garzoglio. Filecules in High-Energy Physics: Characteristics and Impact on Resource Management. In High Performance Distributed Computing (HPDC). 2006.Google Scholar
- Ball, N.M., and D. Schade, Astroinformatics in Canada. White Paper, 2010.Google Scholar
- Amazon Simple Storage Service (Amazon S3): http://aws.amazon.com/s3/.Google Scholar
- Rackspace: http://www.rackspace.com/.Google Scholar
- Garfinkel, S., An Evaluation of Amazon's Grid Computing Services: EC2, S3, and SQS. 2007.Google Scholar
- Palankar, M., A. Iamnitchi, M. Ripeanu, and S. Garfinkel. Amazon S3 for Science Grids: A Viable Solution? In International Workshop on Data-Aware Distributed Computing. Boston, MA. 2008. Google ScholarDigital Library
- Schmuck, F., and R. Haskin. GPFS: A Shared-Disk File System for Large Computing Clusters. In 1st USENIX Conference on File and Storage Technologies (FAST '02). Berkeley, CA. 2002. Google ScholarDigital Library
- Carns, P. H., I. W. Ligon, R. Ross, and R. Thakur. PVFS: A Parallel File System For Linux Clusters. In 4th Annual Linux Showcase and Conference. Atlanta, GA. 2000. Google ScholarDigital Library
- Shvachko, K., H. Kuang, S. Radia, and R. Chansler. The Hadoop Distributed File System. In IEEE 26th Symposium on Mass Storage Systems and Technologies. 2010. Google ScholarDigital Library
- The Nimbus Toolkit: www.nimbusproject.org.Google Scholar
- Keahey, K., I. Foster, T. Freeman, and X. Zhang. Virtual Workspaces: Achieving Quality of Service and Quality of Life in the Grid. Scientific Programming 13 (4):265--275. 2005. Google ScholarDigital Library
- Freeman, T., and K. Keahey, Flying Low: Simple Leases with Workspace Pilot. In EuroPar 2008, 2008. Google ScholarDigital Library
- Vogels, W., Eventually Consistent. ACM Queue, 2008. 6. Google ScholarDigital Library
- Open Cloud Computing Interface (OCCI): http://occi-wg.org/.Google Scholar
- Sandberg, R., D. Goldberg, S. Kleiman, D. Walsh, and B. Lyon. Design and Implementation of the Sun Network Filesystem. In Proceedings of the Summer USENIX Conference. June 1985.Google Scholar
- Gu, Y., and R. Grossman. Sector and Sphere: the Design and Implementation of a High Performance Data Cloud. In CCA. 2008.Google Scholar
- Cassandra: http://cassandra.apache.org/.Google Scholar
- Twisted Matrix Labs: http://twistedmatrix.com/trac/wiki.Google Scholar
- SQLite Home page: http://sqlite.org/.Google Scholar
- PostgreSQL: http://www.postgresql.org/.Google Scholar
- FUSE: Filesystem in Userspace: http://fuse.sourceforge.net/.Google Scholar
- FutureGrid: www.futuregrid.org.Google Scholar
- Allcock, W., GridFTP: Protocol Extensions to FTP for the Grid. In Global Grid Forum. 2003.Google Scholar
- Bonnie Disk I/O Benchmark: http://www.textuality.com/bonnie/.Google Scholar
- Nurmi, D., R. Wolski, C. Grzegorczyk, G. Obertelli, S. Soman, L. Youseff, and D. Zagorodnov. The Eucalyptus Open-Source Cloud-Computing System. In CCGrid. 2008. Google ScholarDigital Library
- OpenStack: The open source, open standards cloud: http://openstack.org/.Google Scholar
- The OpenNebula Project: http://www.opennebula.org/.Google Scholar
- Allcock, W., J. Bresnahan, R. Kettimuthu, M. Link, C. Dumitrescu, I. Raicu, and I. Foster. The Globus Striped GridFTP Framework and Server. In SC '05. 2005. Google ScholarDigital Library
- s3cmd : command line S3 client: http://s3tools.org/s3cmd.Google Scholar
- boto: Python interface to Amazon Web Services: http://code.google.com/p/boto/.Google Scholar
- jets3t: An open source Java toolkit for Amazon S3 and CloudFront: http://jets3t.s3.amazonaws.com/.Google Scholar
- Bogdan Nicolae. High Throughput Data-Compression for Cloud Storage. Pages 1--12 in Proceedings of the Third International Conference on Data Management in Grid and Peer-to-Peer Systems (Globe'10). Abdelkader Hameurlain, Franck Morvan, and A. Min Tjoa (eds.). Springer-Verlag, Berlin. 2010. Google ScholarDigital Library
Index Terms
- Cumulus: an open source storage cloud for science
Recommendations
IC cloud: Enabling compositional cloud
Cloud computing has attracted great interest from both academic and industrial communities. Different paradigms, architectures and applications based on the concept of cloud have emerged. Although many of them have been quite successful, efforts are ...
Implementing private cloud at IIT Roorkee: an initial experience
ICACCI '12: Proceedings of the International Conference on Advances in Computing, Communications and InformaticsCloud Computing has emerged as a promising technology to provide computing resources as public utility. Its features like cost effectiveness, pay per use and scalability have attracted many organization to adopt cloud computing environment in order to ...
Optimal scheduling across public and private clouds in complex hybrid cloud environment
The hybrid cloud extends the private cloud model by using both local and remote resources. The private cloud will rely on the resources leased from public cloud providers for the execution of private cloud applications. The paper presents optimal ...
Comments