skip to main content
10.1145/2287076.2287106acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

SpeQuloS: a QoS service for BoT applications using best effort distributed computing infrastructures

Published: 18 June 2012 Publication History

Abstract

Exploitation of Best Effort Distributed Computing Infrastructures (BE-DCIs) allow operators to maximize the utilization of the infrastructures, and users to access the unused resources at relatively low cost. Because providers do not guarantee that the computing resources remain available to the user during the entire execution of their applications, they offer a diminished Quality of Service (QoS) compared to traditional infrastructures. Profiling the execution of Bag-of-Tasks (BoT) applications on several kinds of BE-DCIs demonstrates that their task completion rate drops near the end of the execution.
In this paper, we present the SpeQuloS framework which enhances the QoS of BoT applications executed on BE-DCIs by reducing the execution time, improving its stability, and reporting to users a predicted completion time. SpeQuloS monitors the execution of the BoT on the BE-DCIs, and dynamically supplies fast and reliable Cloud resources when the critical part of the BoT is executed. We present the design and development of the service and several strategies to decide when and how Cloud resources should be provisioned. Performance evaluation using simulations shows that SpeQuloS fulfill its objectives. It speeds-up the execution of BoTs, in the best cases by a factor greater than 2, while offloading less than 2.5% of the workload to the Cloud. We report on preliminary results after a complex deployment as part of the European Desktop Grid Infrastructure.

References

[1]
O. Agmon Ben-Yehuda, A. Schuster, A. Sharov, M. Silberstein, and A. Iosup. ExPERT: Pareto-efficient task replication on grids and clouds. Technical Report CS-2011-03, Technion, 2011.
[2]
Amazon Web Services. An introduction to spot instances. Technical report, Amazon Elastic Compute Cloud, 2009.
[3]
G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica, Y. Lu, B. Saha, and E. Harris. Reining in the outliers in map-reduce clusters using mantri. In Proceedings of the 9th USENIX conference on Operating systems design and implementation, OSDI'10, 2010.
[4]
D. Anderson. BOINC: A system for public-resource computing and storage. In proceedings of the 5th IEEE/ACM International GRID Workshop, Pittsburgh, USA, 2004.
[5]
N. Andrade, F. Brasileiro, W. Cirne, and M. Mowbray. Automatic grid assembly by promoting collaboration in peer-to-peer grids. Journal of Parallel and Distributed Computing, 67(8), 2007.
[6]
N. Andrade, W. Cirne, F. Brasileiro, and P. Roisenberg. OurGrid: An approach to easily assemble grids with equitable resource sharing. In Proceedings of the 9th Workshop on Job Scheduling Strategies for Parallel Processing, 2003.
[7]
C. Anglano, J. Brevik, M. Canonico, D. Nurmi, and R. Wolski. Fault-aware scheduling for bag-of-tasks applications on desktop grids. In Proceedings of the 7th IEEE/ACM International Conference on Grid Computing, GRID '06, 2006.
[8]
R. Bolze and all. Grid5000: A large scale highly reconfigurable experimental grid testbed. International Journal on High Peerformance Computing and Applications, 2006.
[9]
F. Brasileiro, A. Duarte, D. Carvalho, R. Barber, and D. Scardaci. An approach for the co-existence of service and opportunistic grids: The EELA-2 case. In Latin-American Grid Workshop, 2008.
[10]
R. N. Calheiros, C. Vecchiola, D. Karunamoorthy, and R. Buyya. The Aneka platform and QoS-driven resource provisioning for elastic applications on hybrid clouds. Future Generation Computer Systems, 2011.
[11]
N. Capit, G. Da Costa, Y. Georgiou, G. Huard, C. Martin, G. Mounie, P. Neyron, and O. Richard. A batch scheduler with high level components. In Proceedings of the Fifth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'05), Washington, DC, USA, 2005.
[12]
F. Dong and S. G. Akl. Scheduling algorithms for grid computing: State of the art and open problems. Technical report, Queen's University Kingston, 2006.
[13]
European desktop grid infrastructure. http://edgi-project.eu/, 2010.
[14]
T. Estrada, K. Reed, and M. Taufer. Modeling job lifespan delays in volunteer computing projects. In 9th IEEE International Symposium on Cluster Computing and Grid (CCGrid), 2009.
[15]
G. Fedak, C. Germain, V. Neri, and F. Cappello. XtremWeb: A Generic Global Computing Platform. In CCGRID'2001 Special Session Global Computing on Personal Devices, 2001.
[16]
M. Fishelson and D. Geiger. Exact genetic linkage computations for general pedigrees. Bioinformatics. 2002;18 Suppl 1:S189--98., 2002.
[17]
E. Heien, D. Kondo, and A. David. Correlated resource models of internet end hosts. 31st International Conference on Distributed Computing Systems (ICDCS), Minneapolis, Minnesota, USA, 2011.
[18]
A. Iosup, H. Li, M. Jan, S. Anoep, C. Dumitrescu, L. Wolters, and D. H. Epema. The grid workloads archive. Future Generation Computer Systems, 24(7), 2008.
[19]
A. Iosup, O. Sonmez, S. Anoep, and D. Epema. The performance of bags-of-tasks in large-scale distributed systems. In Proceedings of the 17th international symposium on High performance distributed computing, HPDC '08, 2008.
[20]
M. Islam, P. Balaji, P. Sadayappan, and D. Panda. QoPS: A QoS based scheme for parallel job scheduling. In Job Scheduling Strategies for Parallel Processing, Lecture Notes in Computer Science. Springer, 2003.
[21]
B. Javadi, D. Kondo, J. Vincent, and D. Anderson. Mining for statistical availability models in large-scale distributed systems: An empirical study of SETI@home. In 17th IEEE/ACM International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), 2009.
[22]
D. Kondo, A. Chien, and H. Casanova. Resource management for rapid application turnaround on enterprise desktop grids. In ACM Conference on High Performance Computing and Networking, SC 2004, USA, 2004.
[23]
D. Kondo, B. Javadi, A. Iosup, and D. Epema. The Failure Trace Archive: Enabling comparative analysis of failures in diverse distributed systems. In 10th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2010.
[24]
D. Kondo, B. Javadi, P. Malecot, F. Cappello, and D. Anderson. Cost-benefit analysis of cloud computing versus desktop grids. In 18th International Heterogeneity in Computing Workshop, 2009.
[25]
M. Litzkow, M. Livny, and M. Mutka. Condor - a hunter of idle workstations. In Proceedings of the 8th International Conference of Distributed Computing Systems (ICDCS), 1988.
[26]
M. Mao and M. Humphrey. Auto-scaling to minimize cost and meet application deadlines in cloud workflows. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11. ACM, 2011.
[27]
A. C. Marosi and P. Kacsuk. Workers in the clouds. Parallel, Distributed, and Network-Based Processing, Euromicro Conference on, 2011.
[28]
P. Marshall, K. Keahey, and T. Freeman. Elastic site: Using clouds to elastically extend site resources. In Proceedings of CCGrid'2010, Melbourne, Australia, 2010.
[29]
P. Marshall, K. Keahey, and T. Freeman. Improving utilization of infrastructure clouds. In IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2011), 2011.
[30]
T. N. Minh and L. Wolters. Towards a profound analysis of bags-of-tasks in parallel systems and their performance impact. In High-Performance Parallel and Distributed Computing, 2011.
[31]
D. C. Nurmi, J. Brevik, and R. Wolski. QBETS: queue bounds estimation from time series. In Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, SIGMETRICS '07, 2007.
[32]
A.-M. Oprescu and T. Kielmann. Bag-of-tasks scheduling under budget constraints. In CloudCom, 2010.
[33]
M. R. Palankar, A. Iamnitchi, M. Ripeanu, and S. Garfinkel. Amazon S3 for science grids: a viable solution? In Proceedings of the 2008 international workshop on Data-aware distributed computing, DADC '08, 2008.
[34]
B. Rood and M. J. Lewis. Multi-state grid resource availability characterization. In 8th Grid Computing Conference, 2007.
[35]
M. Silberstein, A. Sharov, D. Geiger, and A. Schuster. GridBot: execution of bags of tasks in multiple grids. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, 2009.
[36]
E. Urbah, P. Kacsuk, Z. Farkas, G. Fedak, G. Kecskemeti, O. Lodygensky, A. Marosi, Z. Balaton, G. Caillat, G. Gombas, A. Kornafeld, J. Kovacs, H. He, and R. Lovas. EDGeS: Bridging egee to boinc and xtremweb. Journal of Grid Computing, 2009.
[37]
C. Vázquez, E. Huedo, R. S. Montero, and I. M. Llorente. On the use of clouds for grid resource provisioning. Future Gener. Comput. Syst., 2011.
[38]
C. Weng and X. Lu. Heuristic scheduling for bag-of-tasks applications in combination with qos in the computational grid. Future Generation Computer Systems, 21(2), 2005.
[39]
M. Zaharia, A. Konwinski, A. Joseph, R. Katz, and I. Stoica. Improving mapreduce performance in heterogeneous environments. In OSDI'08, 2008.

Cited By

View all
  • (2020)Cloud Resource Optimization System Based on Time and CostInternational Journal of Mathematical, Engineering and Management Sciences10.33889/IJMEMS.2020.5.4.0605:4(758-768)Online publication date: 1-Aug-2020
  • (2020)A survey and taxonomy on workload scheduling and resource provisioning in hybrid cloudsCluster Computing10.1007/s10586-020-03048-8Online publication date: 5-Feb-2020
  • (2019)ALICE Connex: A volunteer computing platform for the Time-Of-Flight calibration of the ALICE experiment. An opportunistic use of CPU cycles on Android devicesFuture Generation Computer Systems10.1016/j.future.2018.11.05794(510-523)Online publication date: May-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HPDC '12: Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
June 2012
308 pages
ISBN:9781450308052
DOI:10.1145/2287076
  • General Chair:
  • Dick Epema,
  • Program Chairs:
  • Thilo Kielmann,
  • Matei Ripeanu
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. QoS
  2. cloud
  3. distributed computing infrastructures
  4. grids

Qualifiers

  • Research-article

Conference

HPDC'12
Sponsor:

Acceptance Rates

HPDC '12 Paper Acceptance Rate 23 of 143 submissions, 16%;
Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Cloud Resource Optimization System Based on Time and CostInternational Journal of Mathematical, Engineering and Management Sciences10.33889/IJMEMS.2020.5.4.0605:4(758-768)Online publication date: 1-Aug-2020
  • (2020)A survey and taxonomy on workload scheduling and resource provisioning in hybrid cloudsCluster Computing10.1007/s10586-020-03048-8Online publication date: 5-Feb-2020
  • (2019)ALICE Connex: A volunteer computing platform for the Time-Of-Flight calibration of the ALICE experiment. An opportunistic use of CPU cycles on Android devicesFuture Generation Computer Systems10.1016/j.future.2018.11.05794(510-523)Online publication date: May-2019
  • (2018)Dynamically negotiating capacity between on-demand and batch clustersProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291707(1-11)Online publication date: 11-Nov-2018
  • (2018)Dynamically negotiating capacity between on-demand and batch clustersProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC.2018.00041(1-11)Online publication date: 11-Nov-2018
  • (2017)Fast-Sec: an approach to secure Big Data processing in the cloudInternational Journal of Parallel, Emergent and Distributed Systems10.1080/17445760.2017.1334777(1-16)Online publication date: 14-Jun-2017
  • (2017)Adaptive resource provisioning method using application-aware machine learning based on job history in heterogeneous infrastructuresCluster Computing10.1007/s10586-017-1148-120:4(3537-3549)Online publication date: 1-Dec-2017
  • (2016)Tasklets: "Better than Best-Effort" Computing2016 25th International Conference on Computer Communication and Networks (ICCCN)10.1109/ICCCN.2016.7568580(1-11)Online publication date: Aug-2016
  • (2016)Strategies for Big Data Analytics through Lambda Architectures in Volatile EnvironmentsIFAC-PapersOnLine10.1016/j.ifacol.2016.11.13849:30(114-119)Online publication date: 2016
  • (2016)A Survey on Resource Scheduling in Cloud ComputingJournal of Grid Computing10.1007/s10723-015-9359-214:2(217-264)Online publication date: 1-Jun-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media