skip to main content
research-article
Open access

ADAPT: A framework for coscheduling multithreaded programs

Published: 20 January 2013 Publication History

Abstract

Since multicore systems offer greater performance via parallelism, future computing is progressing towards use of multicore machines with large number of cores. However, the performance of emerging multithreaded programs often does not scale to fully utilize the available cores. Therefore, simultaneously running multiple multithreaded applications becomes inevitable to fully exploit the computing potential of such machines. However, maximizing the performance and throughput on multicore machines in the presence of multiple multithreaded programs is a challenge for the OS. We have observed that the state-of-the-art contention management algorithms fail to effectively coschedule multithreaded programs on multicore machines. To address the above challenge, we present ADAPT, a scheduling framework that continuously monitors the resource usage of multithreaded programs and adaptively coschedules them such that they interfere with each other's performance as little as possible. In addition, ADAPT selects appropriate memory allocation and scheduling policies according to the workload characteristics. We have implemented ADAPT on a 64-core Supermicro server running Solaris 11 and evaluated it using 26 multithreaded programs including the TATP database application, SPECjbb2005, and programs from Phoenix, PARSEC, and SPEC OMP suites. The experimental results show that ADAPT substantially improves total turnaround time and system utilization relative to the default Solaris 11 scheduler.

References

[1]
Barham, P., Black, R., Goldszmidt, M., Isaacs, R., MacCormick, J., Mortier, R., and Simma., A. 2008. Constellation: Automated discovery of service and host dependencies in networked systems. Tech. rep. (MSR-TR-2008-67), Microsoft Research.
[2]
Bhadauria, M. and McKee, S. A. 2010. An approach to resource-aware co-scheduling for cmps. In Proceedings of the 24th ACM International Conference on Supercomputing (ICS'10). ACM, New York, 189--199.
[3]
Bienia, C., Kumar, S., Singh, J. P., and Li, K. 2008. The parsec benchmark suite: characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT'08). ACM, New York, 72--81.
[4]
Blagodurov, S., Zhuravlev, S., Dashti, M., and Fedorova, A. 2011. A case for numa-aware contention management on multicore systems. In Proceedings of the USENIX Annual Technical Conference (USENIXATC'11). USENIX Association, 1.
[5]
Boyd-Wickizer, S., Morris, R., and Kaashoek, M. F. 2009. Reinventing scheduling for multicore systems. In Proceedings of the 12th Conference on Hot Topics in Operating Systems (HotOS'09). USENIX Association, 1.
[6]
Brecht, T. 1993. On the importance of parallel application placement in numa multiprocessors. In Proceedings of the USENIX Systems on USENIX Experiences with Distributed and Multiprocessor Systems Vol. 4. Sedms'93. USENIX Association, 1.
[7]
Cantrill, B. M., Shapiro, M. W., and Leventhal, A. H. 2004. Dynamic instrumentation of production systems. In Proceedings of the Annual Conference on USENIX Annual Technical Conference (ATEC'04). USENIX Association, 2.
[8]
Chandra, R., Devine, S., Verghese, B., Gupta, A., and Rosenblum, M. 1994. Scheduling and page migration for multiprocessor compute servers. In Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VI). ACM, New York, 12--24.
[9]
Corbalán, J., Martorell, X., and Labarta, J. 2000. Performance-driven processor allocation. In Proceedings of the 4th Symposium on Operating System Design & Implementation. Vol. 4. (OSDI'00). USENIX Association, 5.
[10]
Corbalan, J., Martorell, X., and Labarta, J. 2003. Evaluation of the memory page migration influence in the system performance: the case of the sgi o2000. In Proceedings of the 17th Annual International Conference on Supercomputing (ICS'03). ACM, New York, 121--129.
[11]
Eyerman, S. and Eeckhout, L. 2008. System-level performance metrics for multiprogram workloads. IEEE Micro 28, 3, 42--53.
[12]
Gamsa, B., Krieger, O., Appavoo, J., and Stumm, M. 1999. Tornado: Maximizing locality and concurrency in a shared memory multiprocessor operating system. In Proceedings of the 3rd Symposium on Operating Systems Design and Implementation (OSDI'99). USENIX Association, Berkeley, 87--100.
[13]
Gupta, A., Tucker, A., and Urushibara, S. 1991. The impact of operating system scheduling policies and synchronization methods of performance of parallel applications. SIGMETRICS Perform. Eval. Rev. 19, 120--132.
[14]
Hastie, T., Tibshirani, R., and Friedman, J. H. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. Springer Series in Statistics.
[15]
Ipek, E., Mutlu, O., Martínez, J. F., and Caruana, R. 2008. Self-Optimizing memory controllers: A reinforcement learning approach. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA'08). IEEE Computer Society, Washington, DC, USA, 39--50.
[16]
Johnson, F. R., Stoica, R., Ailamaki, A., and Mowry, T. C. 2010. Decoupling contention management from scheduling. In Proceedings of the 15th edition of ASPLOS on Architectural support for Programming Languages and Operating Systems (ASPLOS'10). ACM, New York, 117--128.
[17]
Knauerhase, R., Brett, P., Hohlt, B., Li, T., and Hahn, S. 2008. Using os observations to improve performance in multicore systems. IEEE Micro 28, 3, 54--66.
[18]
LaRowe, Jr., R. P., Ellis, C. S., and Holliday, M. A. 1992. Evaluation of numa memory management through modeling and measurements. IEEE Trans. Parallel Distrib. Syst. 3, 6, 686--701.
[19]
Li, T., Baumberger, D., Koufaty, D. A., and Hahn, S. 2007. Efficient operating system scheduling for performance-asymmetric multi-core architectures. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC'07). ACM, New York, 53:1--53:11.
[20]
McDougall, R. and Mauro, J. 2006. Solaris Internals, 2nd Ed. Prentice Hall.
[21]
McDougall, R., Mauro, J., and Gregg, B. 2006. Solaris Performance and Tools: DTrace and MDB Techniques for Solaris 10 and OpenSolaris. Prentice Hall.
[22]
McGregor, R. L., Antonopoulos, C. D., and Nikolopoulos, D. S. 2005. Scheduling algorithms for effective thread pairing on hybrid multiprocessors. In Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium - Papers - Vol. 01 (IPDPS'05). IEEE Computer Society, USA, 28.1--.
[23]
Merkel, A., Stoess, J., and Bellosa, F. 2010. Resource-conscious scheduling for energy efficiency on multicore processors. In Proceedings of the 5th European Conference on Computer Systems (EuroSys'10). ACM, New York, 153--166.
[24]
Moore, R. W. and Childers, B. R. 2012. Using utility prediction models to dynamically choose program thread counts. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS'12), R. Balasubramonian and V. Srinivasan, Eds. IEEE, 135--144.
[25]
Narayanan, D. and Satyanarayanan, M. 2003. Predictive resource management for wearable computing. In Proceedings of the 1st International Conference on Mobile Systems, Applications and Services (MobiSys'03). ACM, New York, 113--128.
[26]
Padala, P., Hou, K.-Y., Shin, K. G., Zhu, X., Uysal, M., Wang, Z., Singhal, S., and Merchant, A. 2009. Automated control of multiple virtualized resources. In Proceedings of the 4th ACM European Conference on Computer Systems (EuroSys'09). ACM, New York, 13--26.
[27]
Padala, P., Shin, K. G., Zhu, X., Uysal, M., Wang, Z., Singhal, S., Merchant, A., and Salem, K. 2007. Adaptive control of virtualized resources in utility computing environments. In Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems (EuroSys'07). ACM, New York, 289--302.
[28]
Pekhimenko, G. and Brown, A. D. 2008. Machine learning algorithms for choosing compiler heuristics. Tech. rep. MSc. Thesis CS Department, University of Toronto.
[29]
Peter, S., Schüpbach, A., Barham, P., Baumann, A., Isaacs, R., Harris, T., and Roscoe, T. 2010. Design principles for end-to-end multicore schedulers. In Proceedings of the 2nd USENIX Conference on Hot Topics in Parallelism (HotPar'10). USENIX Association, 10.
[30]
Pusukuri, K., Gupta, R., and Bhuyan, L. 2011a. Thread reinforcer: Dynamically determining number of threads via os level monitoring. In Proceedings of the International Symposium on Workload Characterization (IISWC'11). IEEE Computer Society, 116--125.
[31]
Pusukuri, K. K., Gupta, R., and Bhuyan, L. N. 2011b. No more backstabbing… A faithful scheduling policy for multithreaded programs. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT'11). IEEE Computer Society, Washington, DC, 12--21.
[32]
Pusukuri, K. K., Gupta, R., and Bhuyan, L. N. 2012. Thread tranquilizer: Dynamically reducing performance variation. ACM Trans. Archit. Code Optim. 8, 4, 46:1--46:21.
[33]
Pusukuri, K. K., Vengerov, D., Fedorova, A., and Kalogeraki, V. 2011c. Fact: A framework for adaptive contention-aware thread migrations. In Proceedings of the 8th ACM International Conference on Computing Frontiers (CF'11). ACM, New York, 35:1--35:10.
[34]
R. lm(), stepaic(), prune(), vif(), rpart(), kknn(). http://www.statmethods.net/.
[35]
Severance, C. and Enbody, R. J. 1997. Comparing gang scheduling with dynamic space sharing on symmetric multiprocessors using automatic self-allocating threads (asat). In Proceedings of the 11th International Symposium on Parallel Processing (IPPS'97). IEEE Computer Society, 288.
[36]
solidDB. IBM soliddb 6.5 (build 2010-10-04). https://www-304.ibm.com/support/docview.wss?uid=swg24028071.
[37]
SPECjbb. 2005. http://www.spec.org/jbb2005.
[38]
SPECOMP. 2001. http://www.spec.org/omp.
[39]
Tam, D., Azimi, R., and Stumm, M. 2007. Thread clustering: Sharing-Aware scheduling on smp-cmp-smt multiprocessors. In Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems (EuroSys'07). ACM, New York, 47--58.
[40]
TATP. 2003. IBM telecom application transaction processing benchmark description. http://tatpbench-mark.sourceforge.net.
[41]
vif. Multicollinearity. http://en.wikipedia.org/wiki/Multicollinearity.
[42]
VMware. 2005. Vmware esx server 2 numa support. white paper. Tech. rep. http://www.vmware.com/pdf/esx2_NUMA.pdf.
[43]
Yoo, R. M., Romano, A., and Kozyrakis, C. 2009. Phoenix rebirth: Scalable mapreduce on a large-scale shared-memory system. In Proceedings of the IEEE International Symposium on Workload Characterization. (IISWC'09). IEEE Computer Society, 198--207.
[44]
Zhuravlev, S., Blagodurov, S., and Fedorova, A. 2010. Addressing shared resource contention in multicore processors via scheduling. In Proceedings of the 15th Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems (ASPLOS'10). ACM, New York, 129--142.

Cited By

View all
  • (2023)Adapt Burstable Containers to Variable CPU ResourcesIEEE Transactions on Computers10.1109/TC.2022.317448072:3(614-626)Online publication date: 1-Mar-2023
  • (2020)Multi-Level Execution Trace Based Lock Contention Analysis2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)10.1109/ISSREW51248.2020.00068(177-182)Online publication date: Oct-2020
  • (2019)DICERProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337891(1-10)Online publication date: 5-Aug-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 9, Issue 4
Special Issue on High-Performance Embedded Architectures and Compilers
January 2013
876 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/2400682
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 January 2013
Accepted: 01 November 2012
Revised: 01 November 2012
Received: 01 June 2012
Published in TACO Volume 9, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Multicore
  2. cache miss-rate
  3. coscheduling
  4. fairness
  5. lock contention

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)87
  • Downloads (Last 6 weeks)15
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Adapt Burstable Containers to Variable CPU ResourcesIEEE Transactions on Computers10.1109/TC.2022.317448072:3(614-626)Online publication date: 1-Mar-2023
  • (2020)Multi-Level Execution Trace Based Lock Contention Analysis2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)10.1109/ISSREW51248.2020.00068(177-182)Online publication date: Oct-2020
  • (2019)DICERProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337891(1-10)Online publication date: 5-Aug-2019
  • (2019)Using Variability as a Guiding Principle to Reduce Latency in Web Applications via OS ProfilingThe World Wide Web Conference10.1145/3308558.3313406(1759-1770)Online publication date: 13-May-2019
  • (2018)Maximizing system utilization via parallelism management for co-located parallel applicationsProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243199(1-14)Online publication date: 1-Nov-2018
  • (2017)Jumbler: A lock-contention aware thread scheduler for multi-core parallel machines2017 International Conference on Recent Advances in Signal Processing, Telecommunications & Computing (SigTelCom)10.1109/SIGTELCOM.2017.7849799(77-81)Online publication date: Jan-2017
  • (2016)Safe and flexible adaptation via alternate data structure representationsProceedings of the 25th International Conference on Compiler Construction10.1145/2892208.2892220(34-44)Online publication date: 17-Mar-2016
  • (2016)VarySched: A Framework for Variable Scheduling in Heterogeneous Environments2016 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER.2016.19(489-492)Online publication date: Sep-2016
  • (2015)Thread and memory placement on NUMA systemsProceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference10.5555/2813767.2813788(277-289)Online publication date: 8-Jul-2015
  • (2015)TumblerACM Transactions on Architecture and Code Optimization10.1145/282769812:4(1-24)Online publication date: 16-Nov-2015
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media