ABSTRACT
Cloud computing promises to provide researchers with the ability to perform parallel computations using large pools of virtual machines (VMs), without facing the burden of owning or maintaining physical infrastructure. However, with ease of access to hundreds of VMs, comes also an increased management burden. Cloud users today must manually instantiate, configure and maintain the virtual hosts in their cluster. They must learn new cloud APIs that are not germane to the problem of parallel processing. Those APIs usually take several minutes to perform their VM-management tasks, forcing users to keep VMs idling and pay for unused processing time, rather than shut VMs down and power them on as needed. Furthermore, users must still configure their cluster management framework to launch their parallel jobs.
In this paper we show that all this management pain is unnecessary. We show how to combine a cloud API -- SnowFlock -- and a parallel processing framework -- MPI -- to truly realize the potential of the cloud. SnowFlock allows users to fork VMs as if they were processes, occupying in sub-second time multiple physical hosts. We exploit the synergy between this paradigm and MPI's job management to completely hide all details of cloud management from the user. Maintaining a single VM and starting unmodified applications with familiar MPI commands, a user can instantaneously leverage hundreds of processors to perform a parallel computation. Besides making use of cloud resources trivial, we also eliminate the cost of idling -- VMs exist only for as long as they are involved in computation.
- Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. Gapped BLAST and PSI--BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25 (1997), 3389--3402.Google ScholarCross Ref
- Amazon.com. Amazon Elastic Compute Cloud (Amazon EC2). http://aws.amazon.com/ec2/.Google Scholar
- Amazon.com. Amazon Elastic Compute Cloud Developers Guide. http://docs.amazonwebservices.com/AWSEC2/latest/DeveloperGuide/.Google Scholar
- Argonne National Laboratory. Mpich2. http://www.mcs.anl.gov/research/projects/mpich2/.Google Scholar
- Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., and Warfield, A. Xen and the Art of Virtualization. In Proc. of the 17th Symposium on Operating Systems Principles (SOSP) (Bolton Landing, NY, Oct. 2003). Google ScholarDigital Library
- Burns, G., Daoud, R., and Vaigl, J. LAM: An Open Cluster Environment for MPI. In Proc. Supercomputing (1994), pp. 379--386.Google Scholar
- Chandra, R., Menon, R., Dagum, L., Kohr, D., Maydan, D., and McDonald, J. Parallel Programming in OpenMP. Elsevier, 2000. Google ScholarDigital Library
- Chase, J. S., Irwin, D. E., Grit, L. E., Moore, J. D., and Sprenkle, S. E. Dynamic Virtual Clusters in a Grid Site Manager. In Proc. 12th IEEE International Symposium on High Performance Distributed Computing (HPDC) (Washington, DC, 2003). Google ScholarDigital Library
- Clark, C., Fraser, K., Hand, S., Hansen, J. G., Jul, E., Limpach, C., Pratt, I., and Warfield, A. Live Migration of Virtual Machines. In Proc. 2nd Symposium on Networked Systems Design and Implementation (NSDI) (Boston, MA, May 2005). Google ScholarDigital Library
- Cully, B., Lefebvre, G., Meyer, D., Feeley, M., Hutchinson, N., and Warfield, A. Remus: High Availability via Asynchronous Virtual Machine Replication. In Proc. 5th NSDI (San Francisco, CA, Apr. 2008). Google ScholarDigital Library
- Darling, A., Carey, L., and Feng, W.-C. The Design, Implementation, and Evaluation of mpiBLAST. In Proc. 4th International Conference on Linux Clusters: The HPC Revolution 2003 (San Jose, CA, June 2003). http://www.mpiblast.org/.Google Scholar
- Dean, J., and Ghemawat, S. MapReduce: Simplified Data Processing on Large Clusters. In Proc. 6th Symposium on Operating System Design and Implementation (OSDI) (Dec. 2004). Google ScholarDigital Library
- Emeneker, W., and Stanzione, D. Dynamic Virtual Clustering. In Proc. Cluster (Austin, TX, Sept. 2007). Google ScholarDigital Library
- Eucalyptus. http://eucalyptus.cs.ucsb.edu/.Google Scholar
- European Bioinformatics Institute - ClustalW2. http://www.ebi.ac.uk/Tools/clustalw2/index.html.Google Scholar
- Foster, I., Freeman, T., Keahey, K., Scheftner, D., Sotomayor, B., and Zhang, X. Virtual Clusters for Grid Communities. In Proc. Cluster Computing and the Grid (Singapore, May 2006). Google ScholarDigital Library
- Gabriel, E., Fagg, G. E., Bosilca, G., Angskun, T., Dongarra, J. J., Squyres, J. M., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A., Castain, R. H., Daniel, D. J., Graham, R. L., and Woodall, T. S. Open MPI: Goals, concept, and design of a next generation MPI implementation. In Proc., 11th European PVM/MPI Users' Group Meeting (Budapest, Hungary, September 2004), pp. 97--104.Google ScholarCross Ref
- Geist, A., Beguelin, A., Dongarra, J., Jiang, W., Manchek, R., and Sunderam, V. PVM: Parallel Virtual Machine -- A Users' Guide and Tutorial for Networked Parallel Computing. MIT Press, 1994. Google ScholarDigital Library
- Gropp, W., and Lusk, E. Fault Tolerance in MPI Programs. International Journal of High Performance Computing Applications 18, 3 (2004), 363--372. http://www-unix.mcs.anl.gov/~gropp/bib/papers/2002/mpi-fault.ps. Google ScholarDigital Library
- Higgins, D., Thompson, J., and Gibson, T. Clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22 (1994), 4673--4680.Google ScholarCross Ref
- Huelsenbeck, J. P., and Ronquist, F. Mrbayes: Bayesian inference of phylogenetic trees. Bioinformatics 17, 8 (2001), 754--755. http://mrbayes.csit.fsu.edu/.Google ScholarCross Ref
- Lagar-Cavilla, H. A., Whitney, J. A., Scannell, A., Patchin, P., Rumble, S. M., de Lara, E., Brudno, M., and Satyanarayanan, M. SnowFlock: Rapid Virtual Machine Cloning for Cloud Computing. In Proc. of Eurosys 2009 (Nüremberg, Germany, Apr. 2009). To appear. Google ScholarDigital Library
- Li, K.-B. ClustalW-MPI: ClustalW Analysis Using Distributed and Parallel Computing. Bioinformatics 19, 12 (2003), 1585--1586. http://www.bii.a-star.edu.sg/achievements/applications/clustalw/index.php.Google ScholarCross Ref
- Microsoft Azure. http://www.microsoft.com/azure/.Google Scholar
- Microsoft .Net. http://www.microsoft.com/NET/.Google Scholar
- Moab. Moab Cluster Suite, Cluster Resources Inc., 2008. http://www.clusterresources.com/pages/products/moab-cluster-suite.php.Google Scholar
- Open Cirrus (TM). http://opencirrus.org/.Google Scholar
- RPS-BLAST. http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml.Google Scholar
- Tachyon Parallel / Multiprocessor Ray Tracing System. http://jedi.ks.uiuc.edu/~johns/raytracer/.Google Scholar
- University of Toronto. SnowFlock Project Webpage. http://sysweb.cs.toronto.edu/snowflock.Google Scholar
- VASP -- Vienna Ab initio Simulation Package. http://cms.mpi.univie.ac.at/vasp/.Google Scholar
- Vrable, M., MA, J., Chen, J., Moore, D., Vandekieft, E., Snoeren, A., Voelker, G., and Savage, S. Scalability, Fidelity and Containment in the Potemkin Virtual Honeyfarm. In Proc. 20th Symposium on Operating Systems Principles (SOSP) (Oct. 2005). Google ScholarDigital Library
- Whitaker, A., Shaw, M., and Gribble, S. D. Scale and Performance in the Denali Isolation Kernel. In Proc. 5th Symposium on Operating System Design and Implementation (OSDI) (Dec. 2002). Google ScholarDigital Library
- Youseff, L., Wolski, R., Gorda, B., and Krintz, C. Evaluating the Performance Impact of Xen on MPI and Process Execution For HPC Systems. In Proc. 1st International Workshop on Virtualization Technology in Distributed Computing (VTDC) (Washington, DC, Nov. 2006). Google ScholarDigital Library
- Adding the easy button to the cloud with SnowFlock and MPI
Recommendations
SnowFlock: Virtual Machine Cloning as a First-Class Cloud Primitive
A basic building block of cloud computing is virtualization. Virtual machines (VMs) encapsulate a user’s computing environment and efficiently isolate it from that of other users. VMs, however, are large entities, and no clear APIs exist yet to provide ...
SnowFlock: rapid virtual machine cloning for cloud computing
EuroSys '09: Proceedings of the 4th ACM European conference on Computer systemsVirtual Machine (VM) fork is a new cloud computing abstraction that instantaneously clones a VM into multiple replicas running on different hosts. All replicas share the same initial state, matching the intuitive semantics of stateful worker creation. ...
Cloud in cloud: approaches and implementations
SIGITE '10: Proceedings of the 2010 ACM conference on Information technology educationFacilitated by the development of virtual machine (VM) technology, distributed computing and high-speed internet, cloud computing has been gradually adopted in industry and in education to deliver on-demand services and applications remotely. In this ...
Comments