research-article

Open access

ADAPT: A framework for coscheduling multithreaded programs

Authors:

Kishore Kumar Pusukuri,

Laxmi N. BhuyanAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 9, Issue 4

Article No.: 45, Pages 1 - 24

https://doi.org/10.1145/2400682.2400704

Published: 20 January 2013 Publication History

Abstract

Since multicore systems offer greater performance via parallelism, future computing is progressing towards use of multicore machines with large number of cores. However, the performance of emerging multithreaded programs often does not scale to fully utilize the available cores. Therefore, simultaneously running multiple multithreaded applications becomes inevitable to fully exploit the computing potential of such machines. However, maximizing the performance and throughput on multicore machines in the presence of multiple multithreaded programs is a challenge for the OS. We have observed that the state-of-the-art contention management algorithms fail to effectively coschedule multithreaded programs on multicore machines. To address the above challenge, we present ADAPT, a scheduling framework that continuously monitors the resource usage of multithreaded programs and adaptively coschedules them such that they interfere with each other's performance as little as possible. In addition, ADAPT selects appropriate memory allocation and scheduling policies according to the workload characteristics. We have implemented ADAPT on a 64-core Supermicro server running Solaris 11 and evaluated it using 26 multithreaded programs including the TATP database application, SPECjbb2005, and programs from Phoenix, PARSEC, and SPEC OMP suites. The experimental results show that ADAPT substantially improves total turnaround time and system utilization relative to the default Solaris 11 scheduler.

References

[1]

Barham, P., Black, R., Goldszmidt, M., Isaacs, R., MacCormick, J., Mortier, R., and Simma., A. 2008. Constellation: Automated discovery of service and host dependencies in networked systems. Tech. rep. (MSR-TR-2008-67), Microsoft Research.

[2]

Bhadauria, M. and McKee, S. A. 2010. An approach to resource-aware co-scheduling for cmps. In Proceedings of the 24th ACM International Conference on Supercomputing (ICS'10). ACM, New York, 189--199.

Digital Library

[3]

Bienia, C., Kumar, S., Singh, J. P., and Li, K. 2008. The parsec benchmark suite: characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT'08). ACM, New York, 72--81.

Digital Library

[4]

Blagodurov, S., Zhuravlev, S., Dashti, M., and Fedorova, A. 2011. A case for numa-aware contention management on multicore systems. In Proceedings of the USENIX Annual Technical Conference (USENIXATC'11). USENIX Association, 1.

Digital Library

[5]

Boyd-Wickizer, S., Morris, R., and Kaashoek, M. F. 2009. Reinventing scheduling for multicore systems. In Proceedings of the 12th Conference on Hot Topics in Operating Systems (HotOS'09). USENIX Association, 1.

Digital Library

[6]

Brecht, T. 1993. On the importance of parallel application placement in numa multiprocessors. In Proceedings of the USENIX Systems on USENIX Experiences with Distributed and Multiprocessor Systems Vol. 4. Sedms'93. USENIX Association, 1.

Digital Library

[7]

Cantrill, B. M., Shapiro, M. W., and Leventhal, A. H. 2004. Dynamic instrumentation of production systems. In Proceedings of the Annual Conference on USENIX Annual Technical Conference (ATEC'04). USENIX Association, 2.

Digital Library

[8]

Chandra, R., Devine, S., Verghese, B., Gupta, A., and Rosenblum, M. 1994. Scheduling and page migration for multiprocessor compute servers. In Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VI). ACM, New York, 12--24.

Digital Library

[9]

Corbalán, J., Martorell, X., and Labarta, J. 2000. Performance-driven processor allocation. In Proceedings of the 4th Symposium on Operating System Design & Implementation. Vol. 4. (OSDI'00). USENIX Association, 5.

Digital Library

[10]

Corbalan, J., Martorell, X., and Labarta, J. 2003. Evaluation of the memory page migration influence in the system performance: the case of the sgi o2000. In Proceedings of the 17th Annual International Conference on Supercomputing (ICS'03). ACM, New York, 121--129.

Digital Library

[11]

Eyerman, S. and Eeckhout, L. 2008. System-level performance metrics for multiprogram workloads. IEEE Micro 28, 3, 42--53.

Digital Library

[12]

Gamsa, B., Krieger, O., Appavoo, J., and Stumm, M. 1999. Tornado: Maximizing locality and concurrency in a shared memory multiprocessor operating system. In Proceedings of the 3rd Symposium on Operating Systems Design and Implementation (OSDI'99). USENIX Association, Berkeley, 87--100.

Digital Library

[13]

Gupta, A., Tucker, A., and Urushibara, S. 1991. The impact of operating system scheduling policies and synchronization methods of performance of parallel applications. SIGMETRICS Perform. Eval. Rev. 19, 120--132.

Digital Library

[14]

Hastie, T., Tibshirani, R., and Friedman, J. H. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. Springer Series in Statistics.

[15]

Ipek, E., Mutlu, O., Martínez, J. F., and Caruana, R. 2008. Self-Optimizing memory controllers: A reinforcement learning approach. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA'08). IEEE Computer Society, Washington, DC, USA, 39--50.

Digital Library

[16]

Johnson, F. R., Stoica, R., Ailamaki, A., and Mowry, T. C. 2010. Decoupling contention management from scheduling. In Proceedings of the 15th edition of ASPLOS on Architectural support for Programming Languages and Operating Systems (ASPLOS'10). ACM, New York, 117--128.

Digital Library

[17]

Knauerhase, R., Brett, P., Hohlt, B., Li, T., and Hahn, S. 2008. Using os observations to improve performance in multicore systems. IEEE Micro 28, 3, 54--66.

Digital Library

[18]

LaRowe, Jr., R. P., Ellis, C. S., and Holliday, M. A. 1992. Evaluation of numa memory management through modeling and measurements. IEEE Trans. Parallel Distrib. Syst. 3, 6, 686--701.

Digital Library

[19]

Li, T., Baumberger, D., Koufaty, D. A., and Hahn, S. 2007. Efficient operating system scheduling for performance-asymmetric multi-core architectures. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC'07). ACM, New York, 53:1--53:11.

Digital Library

[20]

McDougall, R. and Mauro, J. 2006. Solaris Internals, 2nd Ed. Prentice Hall.

Digital Library

[21]

McDougall, R., Mauro, J., and Gregg, B. 2006. Solaris Performance and Tools: DTrace and MDB Techniques for Solaris 10 and OpenSolaris. Prentice Hall.

Digital Library

[22]

McGregor, R. L., Antonopoulos, C. D., and Nikolopoulos, D. S. 2005. Scheduling algorithms for effective thread pairing on hybrid multiprocessors. In Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium - Papers - Vol. 01 (IPDPS'05). IEEE Computer Society, USA, 28.1--.

Digital Library

[23]

Merkel, A., Stoess, J., and Bellosa, F. 2010. Resource-conscious scheduling for energy efficiency on multicore processors. In Proceedings of the 5th European Conference on Computer Systems (EuroSys'10). ACM, New York, 153--166.

Digital Library

[24]

Moore, R. W. and Childers, B. R. 2012. Using utility prediction models to dynamically choose program thread counts. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS'12), R. Balasubramonian and V. Srinivasan, Eds. IEEE, 135--144.

Digital Library

[25]

Narayanan, D. and Satyanarayanan, M. 2003. Predictive resource management for wearable computing. In Proceedings of the 1st International Conference on Mobile Systems, Applications and Services (MobiSys'03). ACM, New York, 113--128.

Digital Library

[26]

Padala, P., Hou, K.-Y., Shin, K. G., Zhu, X., Uysal, M., Wang, Z., Singhal, S., and Merchant, A. 2009. Automated control of multiple virtualized resources. In Proceedings of the 4th ACM European Conference on Computer Systems (EuroSys'09). ACM, New York, 13--26.

Digital Library

[27]

Padala, P., Shin, K. G., Zhu, X., Uysal, M., Wang, Z., Singhal, S., Merchant, A., and Salem, K. 2007. Adaptive control of virtualized resources in utility computing environments. In Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems (EuroSys'07). ACM, New York, 289--302.

Digital Library

[28]

Pekhimenko, G. and Brown, A. D. 2008. Machine learning algorithms for choosing compiler heuristics. Tech. rep. MSc. Thesis CS Department, University of Toronto.

[29]

Peter, S., Schüpbach, A., Barham, P., Baumann, A., Isaacs, R., Harris, T., and Roscoe, T. 2010. Design principles for end-to-end multicore schedulers. In Proceedings of the 2nd USENIX Conference on Hot Topics in Parallelism (HotPar'10). USENIX Association, 10.

Digital Library

[30]

Pusukuri, K., Gupta, R., and Bhuyan, L. 2011a. Thread reinforcer: Dynamically determining number of threads via os level monitoring. In Proceedings of the International Symposium on Workload Characterization (IISWC'11). IEEE Computer Society, 116--125.

Digital Library

[31]

Pusukuri, K. K., Gupta, R., and Bhuyan, L. N. 2011b. No more backstabbing… A faithful scheduling policy for multithreaded programs. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT'11). IEEE Computer Society, Washington, DC, 12--21.

Digital Library

[32]

Pusukuri, K. K., Gupta, R., and Bhuyan, L. N. 2012. Thread tranquilizer: Dynamically reducing performance variation. ACM Trans. Archit. Code Optim. 8, 4, 46:1--46:21.

Digital Library

[33]

Pusukuri, K. K., Vengerov, D., Fedorova, A., and Kalogeraki, V. 2011c. Fact: A framework for adaptive contention-aware thread migrations. In Proceedings of the 8th ACM International Conference on Computing Frontiers (CF'11). ACM, New York, 35:1--35:10.

Digital Library

[34]

R. lm(), stepaic(), prune(), vif(), rpart(), kknn(). http://www.statmethods.net/.

[35]

Severance, C. and Enbody, R. J. 1997. Comparing gang scheduling with dynamic space sharing on symmetric multiprocessors using automatic self-allocating threads (asat). In Proceedings of the 11th International Symposium on Parallel Processing (IPPS'97). IEEE Computer Society, 288.

Digital Library

[36]

solidDB. IBM soliddb 6.5 (build 2010-10-04). https://www-304.ibm.com/support/docview.wss&quest;uid=swg24028071.

[37]

SPECjbb. 2005. http://www.spec.org/jbb2005.

[38]

SPECOMP. 2001. http://www.spec.org/omp.

[39]

Tam, D., Azimi, R., and Stumm, M. 2007. Thread clustering: Sharing-Aware scheduling on smp-cmp-smt multiprocessors. In Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems (EuroSys'07). ACM, New York, 47--58.

Digital Library

[40]

TATP. 2003. IBM telecom application transaction processing benchmark description. http://tatpbench-mark.sourceforge.net.

[41]

vif. Multicollinearity. http://en.wikipedia.org/wiki/Multicollinearity.

[42]

VMware. 2005. Vmware esx server 2 numa support. white paper. Tech. rep. http://www.vmware.com/pdf/esx2_NUMA.pdf.

[43]

Yoo, R. M., Romano, A., and Kozyrakis, C. 2009. Phoenix rebirth: Scalable mapreduce on a large-scale shared-memory system. In Proceedings of the IEEE International Symposium on Workload Characterization. (IISWC'09). IEEE Computer Society, 198--207.

Digital Library

[44]

Zhuravlev, S., Blagodurov, S., and Fedorova, A. 2010. Addressing shared resource contention in multicore processors via scheduling. In Proceedings of the 15th Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems (ASPLOS'10). ACM, New York, 129--142.

Digital Library

Cited By

Huang HZhao YRao JWu SJin HWang DKun SPan L(2023)Adapt Burstable Containers to Variable CPU ResourcesIEEE Transactions on Computers10.1109/TC.2022.317448072:3(614-626)Online publication date: 1-Mar-2023
https://doi.org/10.1109/TC.2022.3174480
Rezazadeh MEzzati-Jivan NGalea EDagenais M(2020)Multi-Level Execution Trace Based Lock Contention Analysis2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)10.1109/ISSREW51248.2020.00068(177-182)Online publication date: Oct-2020
https://doi.org/10.1109/ISSREW51248.2020.00068
Nikas KPapadopoulou NGiantsidi DKarakostas VGoumas GKoziris N(2019)DICERProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337891(1-10)Online publication date: 5-Aug-2019
https://dl.acm.org/doi/10.1145/3337821.3337891
Show More Cited By

Index Terms

ADAPT: A framework for coscheduling multithreaded programs
1. General and reference
  1. Cross-computing tools and techniques
    1. Performance
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
    2. Extra-functional properties
      1. Software performance

Recommendations

Thread Tranquilizer: Dynamically reducing performance variation
Special Issue on High-Performance Embedded Architectures and Compilers

To realize the performance potential of multicore systems, we must effectively manage the interactions between memory reference behavior and the operating system policies for thread scheduling and migration decisions. We observe that these interactions ...
Shuffling: a framework for lock contention aware thread scheduling for multicore multiprocessor systems
PACT '14: Proceedings of the 23rd international conference on Parallel architectures and compilation

On a cache-coherent multicore multiprocessor system, the performance of a multithreaded application with high lock contention is very sensitive to the distribution of application threads across multiple processors (or Sockets). This is because the ...
Lock contention aware thread migrations
PPoPP '14

On a cache-coherent multicore multiprocessor system, the performance of a multithreaded application with high lock contention is very sensitive to the distribution of application threads across multiple processors. This is because the distribution of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 9, Issue 4

Special Issue on High-Performance Embedded Architectures and Compilers

January 2013

876 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/2400682

Issue’s Table of Contents

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 January 2013

Accepted: 01 November 2012

Revised: 01 November 2012

Received: 01 June 2012

Published in TACO Volume 9, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
810
Total Downloads

Downloads (Last 12 months)87
Downloads (Last 6 weeks)15

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Huang HZhao YRao JWu SJin HWang DKun SPan L(2023)Adapt Burstable Containers to Variable CPU ResourcesIEEE Transactions on Computers10.1109/TC.2022.317448072:3(614-626)Online publication date: 1-Mar-2023
https://doi.org/10.1109/TC.2022.3174480
Rezazadeh MEzzati-Jivan NGalea EDagenais M(2020)Multi-Level Execution Trace Based Lock Contention Analysis2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)10.1109/ISSREW51248.2020.00068(177-182)Online publication date: Oct-2020
https://doi.org/10.1109/ISSREW51248.2020.00068
Nikas KPapadopoulou NGiantsidi DKarakostas VGoumas GKoziris N(2019)DICERProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337891(1-10)Online publication date: 5-Aug-2019
https://dl.acm.org/doi/10.1145/3337821.3337891
Suresh AGandhi A(2019)Using Variability as a Guiding Principle to Reduce Latency in Web Applications via OS ProfilingThe World Wide Web Conference10.1145/3308558.3313406(1759-1770)Online publication date: 13-May-2019
https://dl.acm.org/doi/10.1145/3308558.3313406
Cho YGuzman CEgger BEvripidou SStenström PO'Boyle M(2018)Maximizing system utilization via parallelism management for co-located parallel applicationsProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243199(1-14)Online publication date: 1-Nov-2018
https://dl.acm.org/doi/10.1145/3243176.3243199
Nisar UAleem MIqbal MVo N(2017)Jumbler: A lock-contention aware thread scheduler for multi-core parallel machines2017 International Conference on Recent Advances in Signal Processing, Telecommunications & Computing (SigTelCom)10.1109/SIGTELCOM.2017.7849799(77-81)Online publication date: Jan-2017
https://doi.org/10.1109/SIGTELCOM.2017.7849799
Kusum ANeamtiu IGupta RZaks AHermenegildo M(2016)Safe and flexible adaptation via alternate data structure representationsProceedings of the 25th International Conference on Compiler Construction10.1145/2892208.2892220(34-44)Online publication date: 17-Mar-2016
https://dl.acm.org/doi/10.1145/2892208.2892220
SuB TDoring NGad RNagel LBrinkmann AFeld DSoddemann TLankes S(2016)VarySched: A Framework for Variable Scheduling in Heterogeneous Environments2016 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER.2016.19(489-492)Online publication date: Sep-2016
https://doi.org/10.1109/CLUSTER.2016.19
Lepers BQuéma VFedorova ALu SRiedel E(2015)Thread and memory placement on NUMA systemsProceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference10.5555/2813767.2813788(277-289)Online publication date: 8-Jul-2015
https://dl.acm.org/doi/10.5555/2813767.2813788
Pusukuri KGupta RBhuyan L(2015)TumblerACM Transactions on Architecture and Code Optimization10.1145/282769812:4(1-24)Online publication date: 16-Nov-2015
https://dl.acm.org/doi/10.1145/2827698
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents