skip to main content
10.1145/1519138.1519139acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

Adding the easy button to the cloud with SnowFlock and MPI

Published:31 March 2009Publication History

ABSTRACT

Cloud computing promises to provide researchers with the ability to perform parallel computations using large pools of virtual machines (VMs), without facing the burden of owning or maintaining physical infrastructure. However, with ease of access to hundreds of VMs, comes also an increased management burden. Cloud users today must manually instantiate, configure and maintain the virtual hosts in their cluster. They must learn new cloud APIs that are not germane to the problem of parallel processing. Those APIs usually take several minutes to perform their VM-management tasks, forcing users to keep VMs idling and pay for unused processing time, rather than shut VMs down and power them on as needed. Furthermore, users must still configure their cluster management framework to launch their parallel jobs.

In this paper we show that all this management pain is unnecessary. We show how to combine a cloud API -- SnowFlock -- and a parallel processing framework -- MPI -- to truly realize the potential of the cloud. SnowFlock allows users to fork VMs as if they were processes, occupying in sub-second time multiple physical hosts. We exploit the synergy between this paradigm and MPI's job management to completely hide all details of cloud management from the user. Maintaining a single VM and starting unmodified applications with familiar MPI commands, a user can instantaneously leverage hundreds of processors to perform a parallel computation. Besides making use of cloud resources trivial, we also eliminate the cost of idling -- VMs exist only for as long as they are involved in computation.

References

  1. Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. Gapped BLAST and PSI--BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25 (1997), 3389--3402.Google ScholarGoogle ScholarCross RefCross Ref
  2. Amazon.com. Amazon Elastic Compute Cloud (Amazon EC2). http://aws.amazon.com/ec2/.Google ScholarGoogle Scholar
  3. Amazon.com. Amazon Elastic Compute Cloud Developers Guide. http://docs.amazonwebservices.com/AWSEC2/latest/DeveloperGuide/.Google ScholarGoogle Scholar
  4. Argonne National Laboratory. Mpich2. http://www.mcs.anl.gov/research/projects/mpich2/.Google ScholarGoogle Scholar
  5. Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., and Warfield, A. Xen and the Art of Virtualization. In Proc. of the 17th Symposium on Operating Systems Principles (SOSP) (Bolton Landing, NY, Oct. 2003). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Burns, G., Daoud, R., and Vaigl, J. LAM: An Open Cluster Environment for MPI. In Proc. Supercomputing (1994), pp. 379--386.Google ScholarGoogle Scholar
  7. Chandra, R., Menon, R., Dagum, L., Kohr, D., Maydan, D., and McDonald, J. Parallel Programming in OpenMP. Elsevier, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chase, J. S., Irwin, D. E., Grit, L. E., Moore, J. D., and Sprenkle, S. E. Dynamic Virtual Clusters in a Grid Site Manager. In Proc. 12th IEEE International Symposium on High Performance Distributed Computing (HPDC) (Washington, DC, 2003). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Clark, C., Fraser, K., Hand, S., Hansen, J. G., Jul, E., Limpach, C., Pratt, I., and Warfield, A. Live Migration of Virtual Machines. In Proc. 2nd Symposium on Networked Systems Design and Implementation (NSDI) (Boston, MA, May 2005). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Cully, B., Lefebvre, G., Meyer, D., Feeley, M., Hutchinson, N., and Warfield, A. Remus: High Availability via Asynchronous Virtual Machine Replication. In Proc. 5th NSDI (San Francisco, CA, Apr. 2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Darling, A., Carey, L., and Feng, W.-C. The Design, Implementation, and Evaluation of mpiBLAST. In Proc. 4th International Conference on Linux Clusters: The HPC Revolution 2003 (San Jose, CA, June 2003). http://www.mpiblast.org/.Google ScholarGoogle Scholar
  12. Dean, J., and Ghemawat, S. MapReduce: Simplified Data Processing on Large Clusters. In Proc. 6th Symposium on Operating System Design and Implementation (OSDI) (Dec. 2004). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Emeneker, W., and Stanzione, D. Dynamic Virtual Clustering. In Proc. Cluster (Austin, TX, Sept. 2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Eucalyptus. http://eucalyptus.cs.ucsb.edu/.Google ScholarGoogle Scholar
  15. European Bioinformatics Institute - ClustalW2. http://www.ebi.ac.uk/Tools/clustalw2/index.html.Google ScholarGoogle Scholar
  16. Foster, I., Freeman, T., Keahey, K., Scheftner, D., Sotomayor, B., and Zhang, X. Virtual Clusters for Grid Communities. In Proc. Cluster Computing and the Grid (Singapore, May 2006). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Gabriel, E., Fagg, G. E., Bosilca, G., Angskun, T., Dongarra, J. J., Squyres, J. M., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A., Castain, R. H., Daniel, D. J., Graham, R. L., and Woodall, T. S. Open MPI: Goals, concept, and design of a next generation MPI implementation. In Proc., 11th European PVM/MPI Users' Group Meeting (Budapest, Hungary, September 2004), pp. 97--104.Google ScholarGoogle ScholarCross RefCross Ref
  18. Geist, A., Beguelin, A., Dongarra, J., Jiang, W., Manchek, R., and Sunderam, V. PVM: Parallel Virtual Machine -- A Users' Guide and Tutorial for Networked Parallel Computing. MIT Press, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Gropp, W., and Lusk, E. Fault Tolerance in MPI Programs. International Journal of High Performance Computing Applications 18, 3 (2004), 363--372. http://www-unix.mcs.anl.gov/~gropp/bib/papers/2002/mpi-fault.ps. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Higgins, D., Thompson, J., and Gibson, T. Clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22 (1994), 4673--4680.Google ScholarGoogle ScholarCross RefCross Ref
  21. Huelsenbeck, J. P., and Ronquist, F. Mrbayes: Bayesian inference of phylogenetic trees. Bioinformatics 17, 8 (2001), 754--755. http://mrbayes.csit.fsu.edu/.Google ScholarGoogle ScholarCross RefCross Ref
  22. Lagar-Cavilla, H. A., Whitney, J. A., Scannell, A., Patchin, P., Rumble, S. M., de Lara, E., Brudno, M., and Satyanarayanan, M. SnowFlock: Rapid Virtual Machine Cloning for Cloud Computing. In Proc. of Eurosys 2009 (Nüremberg, Germany, Apr. 2009). To appear. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Li, K.-B. ClustalW-MPI: ClustalW Analysis Using Distributed and Parallel Computing. Bioinformatics 19, 12 (2003), 1585--1586. http://www.bii.a-star.edu.sg/achievements/applications/clustalw/index.php.Google ScholarGoogle ScholarCross RefCross Ref
  24. Microsoft Azure. http://www.microsoft.com/azure/.Google ScholarGoogle Scholar
  25. Microsoft .Net. http://www.microsoft.com/NET/.Google ScholarGoogle Scholar
  26. Moab. Moab Cluster Suite, Cluster Resources Inc., 2008. http://www.clusterresources.com/pages/products/moab-cluster-suite.php.Google ScholarGoogle Scholar
  27. Open Cirrus (TM). http://opencirrus.org/.Google ScholarGoogle Scholar
  28. RPS-BLAST. http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml.Google ScholarGoogle Scholar
  29. Tachyon Parallel / Multiprocessor Ray Tracing System. http://jedi.ks.uiuc.edu/~johns/raytracer/.Google ScholarGoogle Scholar
  30. University of Toronto. SnowFlock Project Webpage. http://sysweb.cs.toronto.edu/snowflock.Google ScholarGoogle Scholar
  31. VASP -- Vienna Ab initio Simulation Package. http://cms.mpi.univie.ac.at/vasp/.Google ScholarGoogle Scholar
  32. Vrable, M., MA, J., Chen, J., Moore, D., Vandekieft, E., Snoeren, A., Voelker, G., and Savage, S. Scalability, Fidelity and Containment in the Potemkin Virtual Honeyfarm. In Proc. 20th Symposium on Operating Systems Principles (SOSP) (Oct. 2005). Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Whitaker, A., Shaw, M., and Gribble, S. D. Scale and Performance in the Denali Isolation Kernel. In Proc. 5th Symposium on Operating System Design and Implementation (OSDI) (Dec. 2002). Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Youseff, L., Wolski, R., Gorda, B., and Krintz, C. Evaluating the Performance Impact of Xen on MPI and Process Execution For HPC Systems. In Proc. 1st International Workshop on Virtualization Technology in Distributed Computing (VTDC) (Washington, DC, Nov. 2006). Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Adding the easy button to the cloud with SnowFlock and MPI

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        HPCVirt '09: Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing
        March 2009
        42 pages
        ISBN:9781605584652
        DOI:10.1145/1519138

        Copyright © 2009 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 31 March 2009

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader