skip to main content
10.1145/2479871.2479908acmconferencesArticle/Chapter ViewAbstractPublication PagesicpeConference Proceedingsconference-collections
research-article

Towards building performance models for data-intensive workloads in public clouds

Published:21 April 2013Publication History

ABSTRACT

The cloud computing paradigm provides the "illusion" of infinite resources and, therefore, becomes a promising candidate for large-scale data-intensive computing. In this paper, we explore experiment-driven performance models for data-intensive workloads executing in an infrastructure-as-a-service (IaaS) public cloud. The performance models help in predicting the workload behaviour, and serve as a key component of a larger framework for resource provisioning in the cloud. We determine a suitable prediction technique after comparing popular regression methods. We also enumerate the variables that impact variance in the workload performance in a public cloud. Finally, we build a performance model for a multi-tenant data service in the Amazon cloud. We find that a linear classifier is sufficient in most cases. On a few occasions, a linear classifier is unsuitable and non-linear modeling is required, which is time consuming. Consequently, we recommend that a linear classifier be used in training the performance model in the first instance. If the resulting model is unsatisfactory, then non-linear modeling can be carried out in the next step.

References

  1. Use WEKA in your Java code. http://weka.wikispaces.com/Use+WEKA+in+your+Java+code.Google ScholarGoogle Scholar
  2. Abouzour, M., Salem, K., and Bumbulis, P., 2010. Automatic tuning of the multiprogramming level in Sybase SQL Anywhere. In 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW), Long Beach, California, USA, 99--104.Google ScholarGoogle Scholar
  3. Ahmad, M., Aboulnaga, A., and Babu, S., 2009. Query interactions in database workloads. In Proceedings of the Second International Workshop on Testing Database Systems ACM, Providence, Rhode Island, US, 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Ahmad, M., Aboulnaga, A., Babu, S., and Munagala, K., 2008. Modeling and exploiting query interactions in database systems. In Proceedings of the 17th ACM conference on Information and knowledge management ACM, Napa Valley, California, USA, 183--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Ahmad, M., Duan, S., Aboulnaga, A., and Babu, S., 2011. Predicting completion times of batch query workloads using interaction-aware models and simulation. In Proceedings of the 14th International Conference on Extending Database Technology (EDBT'11) ACM, Uppsala, Sweden, 449--460. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Amazon, EC2 Instance Types. http://aws.amazon.com/ec2/instance-types/.Google ScholarGoogle Scholar
  7. Amazon, Elastic Compute Cloud (EC2). http://aws.amazon.com/ec2/.Google ScholarGoogle Scholar
  8. Ben-Hur, A. and Weston, J., 2010. A user's guide to support vector machines. Methods in Molecular Biology 609, 2, 223--239.Google ScholarGoogle ScholarCross RefCross Ref
  9. Chang, C.-C. and Lin, C.-J., 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) 2, 3, 1--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Courtois, M. and Woodside, M., 2000. Using regression splines for software performance analysis. In Proceedings of the 2nd international workshop on Software and performance ACM, Ottawa, Ontario, Canada, 105--114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ganapathi, A., Kuno, H., Dayal, U., Wiener, J.L., Fox, A., Jordan, M., and Patterson, D., 2009. Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning. In IEEE 25th International Conference on Data Engineering, 2009. (ICDE '09). IEEE, Shanghai, China, 592--603. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4812438&tag=1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Gupta, C., Mehta, A., and Dayal, U., 2008. PQR: Predicting Query Execution Times for Autonomous Workload Management. In International Conference on Autonomic Computing, 2008. (ICAC '08). IEEE, Chicago, IL 13--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I.H., 2009. The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter 11, 1, 10--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Han, J., Kamber, M., and Pei, J., 2012. Data mining: concepts and techniques (Third Edition). Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Hicks, C.R. and Turner Jr, K., 1999. Fundamental concepts in the design of experiments. Oxford University Press, New York.Google ScholarGoogle Scholar
  16. Hsu, C.W., Chang, C.C., and Lin, C.J., 2003. A practical guide to support vector classification. National Taiwan University. http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf.Google ScholarGoogle Scholar
  17. Kelly, T., 2005. Detecting performance anomalies in global applications. In Proceedings of the 2nd conference on Real, Large Distributed Systems - Volume 2 USENIX Association, San Francisco, CA, 42--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Mian, R. and Martin, P., 2012. Executing data-intensive workloads in a Cloud. In CCGrid Doctoral Symposium 2012 in conjuction with 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Ottawa, Canada, 758--763. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Mian, R., Martin, P., and Vazquez-Poletti, J.L., 2012. Provisioning data analytic workloads in a cloud. Future Generation Computer Systems (FGCS), in press http://dx.doi.org/10.1016/j.future.2012.1001.1008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Osborne, J.W. and Waters, E., 2002. Four assumptions of multiple regression that researchers should always test. Practical Assessment, Research & Evaluation 8, 2, 1--9.Google ScholarGoogle Scholar
  21. Pelleg, D. and Moore, A.W., 2000. X-means: Extending K-means with Efficient Estimation of the Number of Clusters. In Proceedings of the Seventeenth International Conference on Machine Learning Morgan Kaufmann Publishers Inc., 727--734. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Platt, J., 1998. Sequential Minimal Optimization (SMO): A fast algorithm for training support vector machines. Microsoft Research. http://www.bradblock.com/Sequential_Minimal_Optimization_A_Fast_Algorithm_for_Training_Support_Vector_Machine.pdf.Google ScholarGoogle Scholar
  23. Raatikainen, K.E.E., 1993. Cluster analysis and workload classification. SIGMETRICS Perform. Eval. Rev. 20, 4, 24--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Rasmussen, C.E. and Williams, C.K.I., 2006. Gaussian Processes for Machine Learning. The MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Schad, J., Dittrich, J., and Quiane-Ruiz, J.-A., 2010. Runtime measurements in the cloud: observing, analyzing, and reducing variance. Proc. VLDB Endow. 3, 1--2, 460--471. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Sheikh, M.B., Minhas, U.F., Khan, O.Z., Aboulnaga, A., Poupart, P., and Taylor, D.J., 2011. A bayesian approach to online performance modeling for database appliances using gaussian models. In 8th ACM international conference on Autonomic computing (ICAC) ACM, Karlsruhe, Germany, 121--130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Thereska, E., Narayanan, D., and Ganger, G.R., 2006. Towards self-predicting systems: What if you could ask 'what-if'? The Knowledge Engineering Review 21, 03, 261--267. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Tozer, S., Brecht, T., and Aboulnaga, A., 2010. Q-Cop: Avoiding bad query mixes to minimize client timeouts under heavy loads. In IEEE 26th International Conference on Data Engineering (ICDE), Long Beach, CA, USA, 397--408.Google ScholarGoogle Scholar
  29. TPC-C, Order Processing Benchmark. http://www.tpc.org/tpcc/.Google ScholarGoogle Scholar
  30. TPC-E, Detailed description. http://www.tpc.org/tpce/.Google ScholarGoogle Scholar
  31. TPC-E, Trading Benchmark. http://www.tpc.org/tpce/.Google ScholarGoogle Scholar
  32. TPC-H, Decision Support Benchmark. http://www.tpc.org/tpch/.Google ScholarGoogle Scholar
  33. Tsang, I.W., Kwok, J.T., and Cheung, P.-M., 2005. Core vector machines: Fast SVM training on very large data sets. Journal of Machine Learning Research 6, 363--392. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Weikum, G., Moenkeberg, A., Hasse, C., and Zabback, P., 2002. Self-tuning database technology and information services: from wishful thinking to viable engineering. In Proceedings of the 28th international conference on Very Large Data Bases VLDB Endowment, Hong Kong, China, 20--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Weissman, C.D. and Bobrowski, S., 2009. The design of the force.com multitenant internet application development platform. In Proceedings of the 35th SIGMOD international conference on Management of data ACM, Providence, Rhode Island, USA. http://dl.acm.org/citation.cfm?id=1559942. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Witten, I.H., Frank, E., and Hall, M.A., 2011. Data Mining: Practical machine learning tools and techniques (3rd edition). Morgan Kaufmann Pub. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Wu, X., Kumar, V., Ross Quinlan, J., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G., Ng, A., Liu, B., Yu, P., Zhou, Z.-H., Steinbach, M., Hand, D., and Steinberg, D., 2008. Top 10 algorithms in data mining. Knowledge and Information Systems 14, 1, 1--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Zhang, M., Martin, P., Powley, W., Bird, P., and McDonald, K., 2012. Discovering Indicators for Congestion in DBMSs. In Proceedings of the International Workshop on Self-Managing Database Systems (SMDB'12) in Conjunction with the International Conference on Data Engineering (ICDE'12), Washington, DC, USA, in press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Zhang, M., Niu, B., Martin, P., Powley, W., Bird, P., and McDonald, K., 2011. Utility Function-based Workload Management for DBMSs. In Proceedings of the 7th International Conference on Autonomic and Autonomous Systems (ICAS 2011), Mestre, Italy, 116--121.Google ScholarGoogle Scholar
  40. Zhang, Q., Cherkasova, L., Mathews, G., Greene, W., and Smirni, E., 2007. R-Capriccio: A Capacity Planning and Anomaly Detection Tool for Enterprise Services with Live Workloads Middleware 2007. Lecture Notes in Computer Science 4834, 244--265. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Towards building performance models for data-intensive workloads in public clouds

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                ICPE '13: Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering
                April 2013
                446 pages
                ISBN:9781450316361
                DOI:10.1145/2479871

                Copyright © 2013 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 21 April 2013

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article

                Acceptance Rates

                ICPE '13 Paper Acceptance Rate28of64submissions,44%Overall Acceptance Rate252of851submissions,30%

                Upcoming Conference

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader