ABSTRACT
Continuous integration, delivery, and deployment (CICD) is widely used in DevOps communities, as it allows for teams of all sizes to deploy rapidly-changing hardware and software resources quickly and confidently. In this paper, we will describe how University of Colorado Boulder Research Computing has adopted these practices on the RMACC Summit supercomputer [17] to allow system engineers and researchers alike to capitalize on the benefits of CICD-centric development workflows. We will introduce the topic of CICD at a high level and describe how such practices can ease common software management challenges for High-Performance Computing (HPC) resources. We will then document the infrastructure deployed for Summit, and explain how software such as Jenkins and Singularity enabled adaptation for an HPC environment. We will conclude with two case studies discussing the use of our CICD infrastructure: one case study from the perspective of a system engineer maintaining user-facing resources, and the other case study from the perspective of a researcher developing, maintaining, and using the MFiX-Exa codebase.
- 2018. AMREX: Block-Structured AMR Software Framework and Applications. https://amrex-codes.github.io/. (2018). {Online; accessed 06-March-2018}.Google Scholar
- 2018. Docker Compose Documentation. https://docs.docker.com/compose/. (2018). {Online; accessed 06-March-2018}.Google Scholar
- 2018. Docker Hub. https://hub.docker.com/. (2018). {Online; accessed 06-March-2018}.Google Scholar
- 2018. GitLab- The only product for the complete DevOps lifecycle. https://about.gitlab.com/. (2018). {Online; accessed 06-March-2018}.Google Scholar
- 2018. Jenkins. https://jenkins.io/. (2018). {Online; accessed 06-March-2018}.Google Scholar
- 2018. JupyterHub. https://jupyterhub.readthedocs.io/en/stable/. (2018). {Online; accessed 06-March-2018}.Google Scholar
- 2018. MFiX-Exa: multiphase flow with interphase exchanges for exascale. https://amrex-codes.github.io/MFIX-Exa/docs_html/Introduction.html. (2018). {Online; accessed 06-March-2018}.Google Scholar
- 2018. MFiX: multiphase flow with interphase exchanges. https://mfix.netl.doe.gov/. (2018). {Online; accessed 06-March-2018}.Google Scholar
- 2018. Puppet - Get on the shortest path to better software. https://puppet.com/. (2018). {Online; accessed 06-March-2018}.Google Scholar
- 2018. Singularity. http://singularity.lbl.gov/. (2018). {Online; accessed 06-March-2018}.Google Scholar
- 2018. Singularity Global Client. https://singularityhub.github.io/sregistry-cli/. (2018). {Online; accessed 06-March-2018}.Google Scholar
- 2018. Singularity RC Base Image. https://github.com/ResearchComputing/singularity-slurm-base. (2018). {Online; accessed 06-March-2018}.Google Scholar
- 2018. Singularity Registry. (2018). {Online; accessed 06-March-2018}.Google Scholar
- 2018. SSH Plugin - Jenkins. https://wiki.jenkins.io/display/JENKINS/SSH+plugin. (2018). {Online; accessed 06-March-2018}.Google Scholar
- 2018. The Jupyter Notebook. https://jupyter-notebook.readthedocs.io/en/stable/. (2018). {Online; accessed 06-March-2018}.Google Scholar
- Charles Anderson. 2015. Docker {software engineering}. IEEE Software 32, 3 (2015), 102--c3.Google ScholarCross Ref
- Jonathon Anderson, Patrick J Burns, Daniel Milroy, Peter Ruprecht, Thomas Hauser, and Howard Jay Siegel. 2017. Deploying RMACC summit: an HPC resource for the Rocky Mountain Region. In Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact. ACM, 8. Google ScholarDigital Library
- Richard Shane Canon and Doug Jacobsen. 2016. Shifter: containers for HPC. Proceedings of the Cray User Group (2016).Google Scholar
- Theo Combe, Antony Martin, and Roberto Di Pietro. 2016. To Docker or not to Docker: A security perspective. IEEE Cloud Computing 3, 5 (2016), 54--62.Google ScholarCross Ref
- Todd Gamblin, Matthew LeGendre, Michael R Collette, Gregory L Lee, Adam Moody, Bronis R de Supinski, and Scott Futral. 2015. The Spack package manager: bringing order to HPC software chaos. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 40. Google ScholarDigital Library
- Markus Geimer, Kenneth Hoste, and Robert McLay. 2014. Modern scientific software management using EasyBuild and Lmod. In HPC User Support Tools (HUST), 2014 First International Workshop on. IEEE, 41--51. Google ScholarDigital Library
- Val Hendrix, Doug Benjamin, and Yushu Yao. 2012. Scientific Cluster Deployment and Recovery--Using puppet to simplify cluster management. In Journal of Physics: Conference Series, Vol. 396. IOP Publishing, 042027.Google Scholar
- Joshua Higgins, Violeta Holmes, and Colin Venters. 2015. Orchestrating docker containers in the HPC environment. In International Conference on High Performance Computing. Springer, 506--513.Google ScholarCross Ref
- Gregory M Kurtzer, Vanessa Sochat, and Michael W Bauer. 2017. Singularity: Scientific containers for mobility of compute. PloS one 12, 5 (2017), e0177459.Google ScholarCross Ref
- Reid Priedhorsky and Tim Randles. 2017. Charliecloud: Unprivileged containers for user-defined software stacks in hpc. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 36. Google ScholarDigital Library
- Vanessa Sochat. 2017. Singularity Registry: Open Source Registry for Singularity Images. Journal of Open Source Software 2, 18 (2017).Google ScholarCross Ref
- Sébastien Varrette, Pascal Bouvry, Hyacinthe Cartiaux, and Fotis Georgatos. 2014. Management of an academic hpc cluster: The ul experience. In High Performance Computing & Simulation (HPCS), 2014 International Conference on. IEEE, 959--967.Google ScholarCross Ref
- Veronica G Vergara Larrea, Wayne Joubert, and Christopher B Fuson. 2015. Use of Continuous Integration Tools for Application Performance Monitoring. Technical Report. Oak Ridge National Laboratory (ORNL); Oak Ridge Leadership Computing Facility (OLCF).Google Scholar
- Joseph Voss, Joe A Garcia, W Cyrus Proctor, and R Todd Evans. 2017. Automated System Health and Performance Benchmarking Platform: High Performance Computing Test Harness with Jenkins. In Proceedings of the HPC Systems Professionals Workshop. ACM, 1. Google ScholarDigital Library
- Andy B Yoo, Morris A Jette, and Mark Grondona. 2003. Slurm: Simple linux utility for resource management. In Workshop on Job Scheduling Strategies for Parallel Processing. Springer, 44--60.Google ScholarCross Ref
Index Terms
- Continuous Integration and Delivery for HPC: Using Singularity and Jenkins
Recommendations
Continuous Integration for HPC with Github Actions and Tapis
PEARC '22: Practice and Experience in Advanced Research ComputingContinuous integration and deployment (CICD) are fundamental to modern software development. While many platforms such as GitHub and Atlassian provide cloud solutions for CICD, these solutions don’t fully meet the unique needs of high performance ...
COaaS: Continuous Integration and Delivery framework for HPC using Gitlab-Runner
BDIOT '20: Proceedings of the 2020 4th International Conference on Big Data and Internet of ThingsTo quickly and securely deploy the latest version of hardware and software resources, DevOps communities use continuous delivery, continuous integration, and continuous deployment framework. This paper presents the methodology adopted by the m41ab team ...
Building lean continuous integration and delivery pipelines by applying DevOps principles: a case study at Varidesk
ESEC/FSE 2018: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software EngineeringContinuous Integration (CI) and Continuous Delivery (CD) are widely considered to be best practices in software development. Studies have shown however, that adopting these practices can be challenging and there are many barriers that engineers may face,...
Comments