research-article

SpeQuloS: a QoS service for BoT applications using best effort distributed computing infrastructures

Authors:

Simon Delamare,

Oleg LodygenskyAuthors Info & Claims

HPDC '12: Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing

Pages 173 - 186

https://doi.org/10.1145/2287076.2287106

Published: 18 June 2012 Publication History

Abstract

Exploitation of Best Effort Distributed Computing Infrastructures (BE-DCIs) allow operators to maximize the utilization of the infrastructures, and users to access the unused resources at relatively low cost. Because providers do not guarantee that the computing resources remain available to the user during the entire execution of their applications, they offer a diminished Quality of Service (QoS) compared to traditional infrastructures. Profiling the execution of Bag-of-Tasks (BoT) applications on several kinds of BE-DCIs demonstrates that their task completion rate drops near the end of the execution.

In this paper, we present the SpeQuloS framework which enhances the QoS of BoT applications executed on BE-DCIs by reducing the execution time, improving its stability, and reporting to users a predicted completion time. SpeQuloS monitors the execution of the BoT on the BE-DCIs, and dynamically supplies fast and reliable Cloud resources when the critical part of the BoT is executed. We present the design and development of the service and several strategies to decide when and how Cloud resources should be provisioned. Performance evaluation using simulations shows that SpeQuloS fulfill its objectives. It speeds-up the execution of BoTs, in the best cases by a factor greater than 2, while offloading less than 2.5% of the workload to the Cloud. We report on preliminary results after a complex deployment as part of the European Desktop Grid Infrastructure.

References

[1]

O. Agmon Ben-Yehuda, A. Schuster, A. Sharov, M. Silberstein, and A. Iosup. ExPERT: Pareto-efficient task replication on grids and clouds. Technical Report CS-2011-03, Technion, 2011.

[2]

Amazon Web Services. An introduction to spot instances. Technical report, Amazon Elastic Compute Cloud, 2009.

[3]

G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica, Y. Lu, B. Saha, and E. Harris. Reining in the outliers in map-reduce clusters using mantri. In Proceedings of the 9th USENIX conference on Operating systems design and implementation, OSDI'10, 2010.

Digital Library

[4]

D. Anderson. BOINC: A system for public-resource computing and storage. In proceedings of the 5th IEEE/ACM International GRID Workshop, Pittsburgh, USA, 2004.

Digital Library

[5]

N. Andrade, F. Brasileiro, W. Cirne, and M. Mowbray. Automatic grid assembly by promoting collaboration in peer-to-peer grids. Journal of Parallel and Distributed Computing, 67(8), 2007.

Digital Library

[6]

N. Andrade, W. Cirne, F. Brasileiro, and P. Roisenberg. OurGrid: An approach to easily assemble grids with equitable resource sharing. In Proceedings of the 9th Workshop on Job Scheduling Strategies for Parallel Processing, 2003.

[7]

C. Anglano, J. Brevik, M. Canonico, D. Nurmi, and R. Wolski. Fault-aware scheduling for bag-of-tasks applications on desktop grids. In Proceedings of the 7th IEEE/ACM International Conference on Grid Computing, GRID '06, 2006.

Digital Library

[8]

R. Bolze and all. Grid5000: A large scale highly reconfigurable experimental grid testbed. International Journal on High Peerformance Computing and Applications, 2006.

Digital Library

[9]

F. Brasileiro, A. Duarte, D. Carvalho, R. Barber, and D. Scardaci. An approach for the co-existence of service and opportunistic grids: The EELA-2 case. In Latin-American Grid Workshop, 2008.

[10]

R. N. Calheiros, C. Vecchiola, D. Karunamoorthy, and R. Buyya. The Aneka platform and QoS-driven resource provisioning for elastic applications on hybrid clouds. Future Generation Computer Systems, 2011.

Digital Library

[11]

N. Capit, G. Da Costa, Y. Georgiou, G. Huard, C. Martin, G. Mounie, P. Neyron, and O. Richard. A batch scheduler with high level components. In Proceedings of the Fifth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'05), Washington, DC, USA, 2005.

Digital Library

[12]

F. Dong and S. G. Akl. Scheduling algorithms for grid computing: State of the art and open problems. Technical report, Queen's University Kingston, 2006.

[13]

European desktop grid infrastructure. http://edgi-project.eu/, 2010.

[14]

T. Estrada, K. Reed, and M. Taufer. Modeling job lifespan delays in volunteer computing projects. In 9th IEEE International Symposium on Cluster Computing and Grid (CCGrid), 2009.

Digital Library

[15]

G. Fedak, C. Germain, V. Neri, and F. Cappello. XtremWeb: A Generic Global Computing Platform. In CCGRID'2001 Special Session Global Computing on Personal Devices, 2001.

Digital Library

[16]

M. Fishelson and D. Geiger. Exact genetic linkage computations for general pedigrees. Bioinformatics. 2002;18 Suppl 1:S189--98., 2002.

[17]

E. Heien, D. Kondo, and A. David. Correlated resource models of internet end hosts. 31st International Conference on Distributed Computing Systems (ICDCS), Minneapolis, Minnesota, USA, 2011.

Digital Library

[18]

A. Iosup, H. Li, M. Jan, S. Anoep, C. Dumitrescu, L. Wolters, and D. H. Epema. The grid workloads archive. Future Generation Computer Systems, 24(7), 2008.

Digital Library

[19]

A. Iosup, O. Sonmez, S. Anoep, and D. Epema. The performance of bags-of-tasks in large-scale distributed systems. In Proceedings of the 17th international symposium on High performance distributed computing, HPDC '08, 2008.

Digital Library

[20]

M. Islam, P. Balaji, P. Sadayappan, and D. Panda. QoPS: A QoS based scheme for parallel job scheduling. In Job Scheduling Strategies for Parallel Processing, Lecture Notes in Computer Science. Springer, 2003.

[21]

B. Javadi, D. Kondo, J. Vincent, and D. Anderson. Mining for statistical availability models in large-scale distributed systems: An empirical study of SETI@home. In 17th IEEE/ACM International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), 2009.

[22]

D. Kondo, A. Chien, and H. Casanova. Resource management for rapid application turnaround on enterprise desktop grids. In ACM Conference on High Performance Computing and Networking, SC 2004, USA, 2004.

Digital Library

[23]

D. Kondo, B. Javadi, A. Iosup, and D. Epema. The Failure Trace Archive: Enabling comparative analysis of failures in diverse distributed systems. In 10th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2010.

Digital Library

[24]

D. Kondo, B. Javadi, P. Malecot, F. Cappello, and D. Anderson. Cost-benefit analysis of cloud computing versus desktop grids. In 18th International Heterogeneity in Computing Workshop, 2009.

Digital Library

[25]

M. Litzkow, M. Livny, and M. Mutka. Condor - a hunter of idle workstations. In Proceedings of the 8th International Conference of Distributed Computing Systems (ICDCS), 1988.

[26]

M. Mao and M. Humphrey. Auto-scaling to minimize cost and meet application deadlines in cloud workflows. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11. ACM, 2011.

Digital Library

[27]

A. C. Marosi and P. Kacsuk. Workers in the clouds. Parallel, Distributed, and Network-Based Processing, Euromicro Conference on, 2011.

Digital Library

[28]

P. Marshall, K. Keahey, and T. Freeman. Elastic site: Using clouds to elastically extend site resources. In Proceedings of CCGrid'2010, Melbourne, Australia, 2010.

Digital Library

[29]

P. Marshall, K. Keahey, and T. Freeman. Improving utilization of infrastructure clouds. In IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2011), 2011.

Digital Library

[30]

T. N. Minh and L. Wolters. Towards a profound analysis of bags-of-tasks in parallel systems and their performance impact. In High-Performance Parallel and Distributed Computing, 2011.

[31]

D. C. Nurmi, J. Brevik, and R. Wolski. QBETS: queue bounds estimation from time series. In Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, SIGMETRICS '07, 2007.

Digital Library

[32]

A.-M. Oprescu and T. Kielmann. Bag-of-tasks scheduling under budget constraints. In CloudCom, 2010.

Digital Library

[33]

M. R. Palankar, A. Iamnitchi, M. Ripeanu, and S. Garfinkel. Amazon S3 for science grids: a viable solution? In Proceedings of the 2008 international workshop on Data-aware distributed computing, DADC '08, 2008.

Digital Library

[34]

B. Rood and M. J. Lewis. Multi-state grid resource availability characterization. In 8th Grid Computing Conference, 2007.

Digital Library

[35]

M. Silberstein, A. Sharov, D. Geiger, and A. Schuster. GridBot: execution of bags of tasks in multiple grids. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, 2009.

Digital Library

[36]

E. Urbah, P. Kacsuk, Z. Farkas, G. Fedak, G. Kecskemeti, O. Lodygensky, A. Marosi, Z. Balaton, G. Caillat, G. Gombas, A. Kornafeld, J. Kovacs, H. He, and R. Lovas. EDGeS: Bridging egee to boinc and xtremweb. Journal of Grid Computing, 2009.

[37]

C. Vázquez, E. Huedo, R. S. Montero, and I. M. Llorente. On the use of clouds for grid resource provisioning. Future Gener. Comput. Syst., 2011.

Digital Library

[38]

C. Weng and X. Lu. Heuristic scheduling for bag-of-tasks applications in combination with qos in the computational grid. Future Generation Computer Systems, 21(2), 2005.

Digital Library

[39]

M. Zaharia, A. Konwinski, A. Joseph, R. Katz, and I. Stoica. Improving mapreduce performance in heterogeneous environments. In OSDI'08, 2008.

Digital Library

Cited By

Dewangan BAgarwal ATanupriya TPasricha A(2020)Cloud Resource Optimization System Based on Time and CostInternational Journal of Mathematical, Engineering and Management Sciences10.33889/IJMEMS.2020.5.4.0605:4(758-768)Online publication date: 1-Aug-2020
https://doi.org/10.33889/IJMEMS.2020.5.4.060
Wang BWang CSong YCao JCui XZhang L(2020)A survey and taxonomy on workload scheduling and resource provisioning in hybrid cloudsCluster Computing10.1007/s10586-020-03048-8Online publication date: 5-Feb-2020
https://doi.org/10.1007/s10586-020-03048-8
Jenviriyakul PChalumporn GAchalakul TCosta FAkkarajitsakul K(2019)ALICE Connex: A volunteer computing platform for the Time-Of-Flight calibration of the ALICE experiment. An opportunistic use of CPU cycles on Android devicesFuture Generation Computer Systems10.1016/j.future.2018.11.05794(510-523)Online publication date: May-2019
https://doi.org/10.1016/j.future.2018.11.057
Show More Cited By

Index Terms

SpeQuloS: a QoS service for BoT applications using best effort distributed computing infrastructures
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
2. Software and its engineering
  1. Software organization and properties
    1. Software system structures
      1. Distributed systems organizing principles

Recommendations

SpeQuloS: a QoS service for hybrid and elastic computing infrastructures

The large choice of Distributed Computing Infrastructures (DCIs) available allows users to select and combine their preferred architectures amongst Clusters, Grids, Clouds, Desktop Grids and more. In these hybrid DCIs, elasticity is emerging as a key ...
QoS Management in Cloud@Home Infrastructures
CYBERC '11: Proceedings of the 2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery

Cloud is strongly emerging as the new deal of distributed computing. One of the reason behind the Cloud success is its business/commercial-oriented nature, proof of its effectiveness and applicability to real problems. There are actually a lot of open-...
Fault tolerance and QoS scheduling using CAN in mobile social cloud computing

The performance of mobile devices including smart phones and laptops is steadily rising as prices plummet sharply. So, mobile devices are changing from being a mere interface for requesting services to becoming computing resources for providing and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

HPDC '12: Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing

June 2012

308 pages

ISBN:9781450308052

DOI:10.1145/2287076

General Chair:
Dick Epema
Delft University of Technology and Eindhoven University of Technology, The Netherlands
,
Program Chairs:
Thilo Kielmann
Vrije Universiteit, The Netherlands
,
Matei Ripeanu
The University of British Columbia, Canada

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

University of Arizona: University of Arizona
SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

HPDC'12

Sponsor:

University of Arizona
SIGARCH

HPDC'12: The 21st International Symposium on High-Performance Parallel and Distributed Computing

June 18 - 22, 2012

Delft, The Netherlands

Acceptance Rates

HPDC '12 Paper Acceptance Rate 23 of 143 submissions, 16%;

Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
318
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Dewangan BAgarwal ATanupriya TPasricha A(2020)Cloud Resource Optimization System Based on Time and CostInternational Journal of Mathematical, Engineering and Management Sciences10.33889/IJMEMS.2020.5.4.0605:4(758-768)Online publication date: 1-Aug-2020
https://doi.org/10.33889/IJMEMS.2020.5.4.060
Wang BWang CSong YCao JCui XZhang L(2020)A survey and taxonomy on workload scheduling and resource provisioning in hybrid cloudsCluster Computing10.1007/s10586-020-03048-8Online publication date: 5-Feb-2020
https://doi.org/10.1007/s10586-020-03048-8
Jenviriyakul PChalumporn GAchalakul TCosta FAkkarajitsakul K(2019)ALICE Connex: A volunteer computing platform for the Time-Of-Flight calibration of the ALICE experiment. An opportunistic use of CPU cycles on Android devicesFuture Generation Computer Systems10.1016/j.future.2018.11.05794(510-523)Online publication date: May-2019
https://doi.org/10.1016/j.future.2018.11.057
Liu FKeahey KRiteau PWeissman J(2018)Dynamically negotiating capacity between on-demand and batch clustersProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291707(1-11)Online publication date: 11-Nov-2018
https://dl.acm.org/doi/10.5555/3291656.3291707
Liu FKeahey KRiteau PWeissman J(2018)Dynamically negotiating capacity between on-demand and batch clustersProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC.2018.00041(1-11)Online publication date: 11-Nov-2018
https://dl.acm.org/doi/10.1109/SC.2018.00041
Anjos JGalibus TGeyer CFedak GCosta JPereira Rde Freitas E(2017)Fast-Sec: an approach to secure Big Data processing in the cloudInternational Journal of Parallel, Emergent and Distributed Systems10.1080/17445760.2017.1334777(1-16)Online publication date: 14-Jun-2017
https://doi.org/10.1080/17445760.2017.1334777
Choi JKim Y(2017)Adaptive resource provisioning method using application-aware machine learning based on job history in heterogeneous infrastructuresCluster Computing10.1007/s10586-017-1148-120:4(3537-3549)Online publication date: 1-Dec-2017
https://dl.acm.org/doi/10.1007/s10586-017-1148-1
Schafer DEdinger JPaluska JVanSyckel SBecker C(2016)Tasklets: "Better than Best-Effort" Computing2016 25th International Conference on Computer Communication and Networks (ICCCN)10.1109/ICCCN.2016.7568580(1-11)Online publication date: Aug-2016
https://doi.org/10.1109/ICCCN.2016.7568580
Alexandre da Silva VJulio C.S. dos AEdison Pignaton dThomas J. LClaudio F. G(2016)Strategies for Big Data Analytics through Lambda Architectures in Volatile EnvironmentsIFAC-PapersOnLine10.1016/j.ifacol.2016.11.13849:30(114-119)Online publication date: 2016
https://doi.org/10.1016/j.ifacol.2016.11.138
Singh SChana I(2016)A Survey on Resource Scheduling in Cloud ComputingJournal of Grid Computing10.1007/s10723-015-9359-214:2(217-264)Online publication date: 1-Jun-2016
https://dl.acm.org/doi/10.1007/s10723-015-9359-2
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten