ACM Home Page
Please provide us with feedback. Feedback
VARQ: virtual advance reservations for queues
Full text PdfPdf (484 KB)
Source
High Performance Distributed Computing archive
Proceedings of the 17th international symposium on High performance distributed computing table of contents
Boston, MA, USA
SESSION: Reservations, leasing, and scheduling table of contents
Pages 75-86  
Year of Publication: 2008
ISBN:978-1-59593-997-5
Authors
Daniel Charles Nurmi  University of California Santa Barbara, Santa Barbara, USA
Rich Wolski  University of California Santa Barbara, Santa Barbara, USA
John Brevik  California State University Long Beach, Long Beach, USA
Sponsors
ACM: Association for Computing Machinery
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 17,   Downloads (12 Months): 52,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1383422.1383433
What is a DOI?

ABSTRACT

In high-performance computing (HPC) settings, in which multiprocessor machines are shared among users with potentially competing resource demands, processors are allocated to user workload using space sharing. Typically, users interact with a given machine by submitting their jobs to a centralized batch scheduler that implements a site-specific policy designed to maximize machine utilization while providing tolerable turn-around times. To these users, the functioning of the batch scheduler and the policies it implements are both critical operating system components since they control how each job is serviced. In practice, while most HPC systems experience good utilization levels, the amount of time experienced by individual jobs waiting to begin execution has been shown to be highly variable and difficult to predict, leading to user confusion and/or frustration.

One method for dealing with this uncertainty that has been proposed is to allow users who are willing to plan ahead to make "advanced reservations" for processor resources. To date, however, few HPC centers provide an advanced reservation capability to their general user populations since previous research indicates that diminished machine utilization will occur if and when advanced reservations are introduced.

In this work, we describe VARQ, a new method for job scheduling that provides users with probabilistic "virtual" advanced reservations using only existing best effort batch schedulers. VARQ functions as an overlay, submitting jobs that are indistinguishable from the normal workload serviced by a scheduler. We describe the statistical methods we use to implement VARQ, detail an empirical evaluation of its effectiveness in a number of HPC settings, and explore the potential future impact of VARQ should it become widely used. Without requiring HPC sites to support advanced reservations, we find that VARQ can implement a reservation capability probabilistically and that the effects of this probabilistic approach are unlikely to negatively affect resource utilization.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
F. Berman, G. Fox, and T. Hey. Grid Computing: Making the Global Infrastructure a Reality. Wiley and Sons, 2003.
 
2
J. Brevik, D. Nurmi, and R. Wolski. Predicting bounds on queuing delay for batch-scheduled parallel machines. In Proceedings of PPoPP 2006, March 2006.
 
3
A. Bucur and D. Epema. The performance of processor co-allocation in multicluster systems. In 3rd IEEE/ACM Int'l Symp. on Cluster Computing and the GRID (CCGrid2003.
 
4
S. Clearwater and S. Kleban. Heavy-tailed distributions in supercomputer jobs. Technical Report SAND2002-2378C, Sandia National Labs, 2002.
 
5
C. Czajkowski, I. Foster, N. Karonis, C. Kesselman, S. Martin, W. Smith, and S. Tuecke. A resource management architecture for metacomputing systems. In International Parallel Processing Symp. -- Workshop on Job Scheduling Strategies for Parallel Processing, 1998.
 
6
A. Downey. Predicting queue times on space-sharing parallel computers. In Proceedings of the 11th International Parallel Processing Symposium, April 1997.
 
7
A. Downey. Using queue time predictions for processor allocation. In Proceedings of the 3rd Workshop on Job Scheduling Strategies for Parallel Processing, April 1997.
 
8
C. Ernemann, V. Hamscher, U. Schwiegelshohn, R. Yahyapour, and A. Streit. On advantages of grid computing for parallel job scheduling. In 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2002, pages 39--47.
 
9
C. Ernemann, V. Hamscher, and R. Yahyapour. Economic scheduling in grid computing, 2002.
 
10
D. G. Feitelson. A survey of scheduling in multiprogrammed parallel systems.
 
11
D. G. Feitelson, L. Rudolph, and U. Schwiegelshohn. Parallel job scheduling -- a status report, 2004.
 
12
I. Foster and C. Kesselman. The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann Publishers, Inc., 1998.
 
13
J. Gehring and T. Preiss. Scheduling a metacomputer with uncooperative sub-schedulers. In Proc. JSSPP, pages 179.
 
14
M. Harchol-Balter. The effect of heavy-tailed job size distributions on computer system design. In Proceedings of ASA-IMS Conference on Applications of Heavy Tailed Distributions in Economics, Engineering and Statistics, June 1999.
 
15
F. Heine, M. Hovestadt, O. Kao, and A. Streit. On the impact of reservations from the grid on planning-based resource management. In International Workshop on Grid Computing Security and Resource Management (GSRM 2005) at ICCS 2005, Atlanta, USA, Springer, LNCS 3516, pages 155--162.
 
16
D. Jackson, Q. Snell, and M. Clement. Core algorithms of the maui scheduler. In 7th Workshop on Job Scheduling Strategies for Parallel Processing, 2001.
 
17
D. Lifka. The ANL/IBM SP scheduling system, volume 949. Springer-Verlag, 1995.
 
18
Maui scheduler home page -- http://www.clusterresources.com/products/maui/.
 
19
A. W. Mu'alem and D. G. Feitelson. Utilization, predictability, workloads, and user runtime estimates in scheduling the ibm sp2 with backfilling. In IEEE Trans. Parallel and Distributed Syst. 12(6), Jun 2001, pages 529--543.
 
20
C. Ng, P. Buonadonna, B. N. Chun, A. C. Snoeren, , and A. Vahdat. Addressing strategic behavior in a deployed microeconomic resource allocator. In In Proceedings of the 3rd Workshop on Economics of Peer-to-Peer Systems, 2005.
 
21
D. Nurmi, J. Brevik, and R. Wolski. Qbets: Queue bounds estimation from time series. In Proceedings of Job Scheduling Strategies for Parallel Processing (JSSPP), June 2007.
 
22
The qbets web page -- http://nws.cs.ucsb.edu/batchq.
 
23
J. Shneidman, C. Ng, D. C. Parkes, A. AuYoung, A. C. Snoeren, and A. Vahdat. Why markets could (but don't currently) solve resource allocation problems in systems. In Proceedings of the 10th USENIX Workshop on Hot Topics in Operating Systems, 2005.
 
24
L. Smarr and C. E. Catlett. Metacomputing, 1992.
 
25
W. Smith, I. Foster, and V. Taylor. Scheduling with advanced reservations. In Parallel and Distributed Processing Symposium (IPDPS 2000), pages 127--132.
 
26
W. Smith, V. E. Taylor, and I. T. Foster. Using run-time predictions to estimate queue wait times and improve scheduler performance. In IPPS/SPDP '99/JSSPP '99: Proceedings of the Job Scheduling Strategies for Parallel Processing, pages 202--219, London, UK, 1999. Springer-Verlag.
 
27
Q. Snell, M. Clement, D. Jackson, and C. Gregory. The performance impact of advance reservation meta-scheduling. In 6th Workshop on Job Scheduling Strategies for Parallel Processing, pages 137--153, 2000.
 
28
The teragrid user portal -- http://portal.teragrid.org.

Collaborative Colleagues:
Daniel Charles Nurmi: colleagues
Rich Wolski: colleagues
John Brevik: colleagues