skip to main content
10.1145/3167020.3167062acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmedesConference Proceedingsconference-collections
research-article

Automating Job Monitoring System for an Ecosystem of High Performance Computing

Authors Info & Claims
Published:07 November 2017Publication History

ABSTRACT

Many countries have founded national high performance computing center aiming to provide computational resources to their scientists upon requests. The resources provided are not efficient because the job requests are not relative to the real use leading to unnecessary resource consumption. In this paper, we present a method to monitor and manage High Performance Computing (HPC) resources more efficiently. Usually, the HPC resources are managed by a Portable Batch System (PBS) as the Job Management System (JMS) for effective job scheduling and resource allocation. However, the HPC resources often engage in inefficient job requests. For instance, a job request may have for four processors running per node for two hours, but the actual usage engages four processors per node for one hour. Hence, the HPC resources lose an hour of productivity. As a consequence, the queues for job execution are longer. The automated job monitoring system proposed in this paper would scan all the jobs on every HPC Node and compare the job requests conditions with preset criteria. If the conditions meet the criteria, then the inefficient jobs are forced to cancel from the HPC queue. The results show that more HPC resources are available for executing other jobs in the queue, leading to saved resources in the HPC environment and Stabilization of HPC hardware, promoting an HPC infrastructure ecosystem.

References

  1. Aida, K. 2000. Effect of job size characteristics on job scheduling performance. Lecture notes in computer science. 1911, (2000), 1--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. CERN | Accelerating science: https://home.cern/. Accessed: 2017-06-24.Google ScholarGoogle Scholar
  3. Downey, A.B. 1997. A parallel workload model and its implications for processor allocation. High Performance Distributed Computing, 1997. Proceedings. The Sixth IEEE International Symposium on (1997), 112--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Feitelson, D.G. 1996. Packing schemes for gang scheduling. Job Scheduling Strategies for Parallel Processing: IPPS '96 Workshop Honolulu, Hawaii, April 16, 1996 Proceedings. D.G. Feitelson and L. Rudolph, eds. Springer Berlin Heidelberg. 89--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Home - National e-Science Infrastructure Consortium: http://www.e-science.in.th/infra/. Accessed: 2016-07-06.Google ScholarGoogle Scholar
  6. Hovestadt, M. et al. 2003. Scheduling in HPC Resource Management Systems: Queuing vs. Planning. Job Scheduling Strategies for Parallel Processing: 9th International Workshop, JSSPP 2003, Seattle, WA, USA, June 24, 2003. Revised Paper. D. Feitelson et al., eds. Springer Berlin Heidelberg. 1--20.Google ScholarGoogle Scholar
  7. Job Management Systems: http://www.cro-ngi.hr/en/technologies/cluster-technologies/job-management-systems/. Accessed: 2017-03-07.Google ScholarGoogle Scholar
  8. Lifka, D.A. 1995. The anl/ibm sp scheduling system. Workshop on Job Scheduling Strategies for Parallel Processing (1995), 295--303. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Lo, V. et al. 1998. A comparative study of real workload traces and synthetic workload models for parallel job scheduling. Workshop on Job Scheduling Strategies for Parallel Processing (1998), 25--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Skovira, J. et al. 1996. The EASY---LoadLeveler API Project. Job Scheduling Strategies for Parallel Processing (1996), 41--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Subhlok, J. et al. 1996. Impact of job mix on optimizations for space sharing schedulers. Supercomputing, 1996. Proceedings of the 1996 ACM/IEEE Conference on (1996), 54--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. V. Subramani et al. 2002. Distributed job scheduling on computational Grids using multiple simultaneous requests. Proceedings 11th IEEE International Symposium on High Performance Distributed Computing (2002), 359--366. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Welcome to Python.org: https://www.python.org/. Accessed: 2017-06-26.Google ScholarGoogle Scholar
  14. What is a Bash Script? - Bash Scripting Tutorial: http://ryanstutorials.net/bash-scripting-tutorial/bash-script.php. Accessed: 2017-06-26.Google ScholarGoogle Scholar
  15. Yan, Y. and Chapman, B. 2008. Comparative Study of Distributed Resource Management Systems--SGE, LSF, PBS Pro, and LoadLeveler. Technical Report-Citeseerx. (2008).Google ScholarGoogle Scholar
  1. Automating Job Monitoring System for an Ecosystem of High Performance Computing

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      MEDES '17: Proceedings of the 9th International Conference on Management of Digital EcoSystems
      November 2017
      299 pages
      ISBN:9781450348959
      DOI:10.1145/3167020

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 7 November 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      MEDES '17 Paper Acceptance Rate41of65submissions,63%Overall Acceptance Rate267of682submissions,39%
    • Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)0

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader