research-article

Free Access

Dynamic QoS management for chip multiprocessors

Authors:
Bin Li

Princeton University, Hillsboro, OR

Princeton University, Hillsboro, OR
View Profile

,
Li-Shiuan Peh

Massachusetts Institute of Technology, Cambridge, MA

Massachusetts Institute of Technology, Cambridge, MA
View Profile

,
Li Zhao

Intel Labs, Hillsboro, OR

Intel Labs, Hillsboro, OR
View Profile

,
Ravi Iyer

Intel Labs, Hillsboro, OR

Intel Labs, Hillsboro, OR
View Profile

ACM Transactions on Architecture and Code Optimization Volume 9 Issue 3Article No.: 17pp 1–29https://doi.org/10.1145/2355585.2355590

Published:05 October 2012Publication History

ACM Transactions on Architecture and Code Optimization

Abstract

With the continuing scaling of semiconductor technologies, chip multiprocessor (CMP) has become the de facto design for modern high performance computer architectures. It is expected that more and more applications with diverse requirements will run simultaneously on the CMP platform. However, this will exert contention on shared resources such as the last level cache, network-on-chip bandwidth and off-chip memory bandwidth, thus affecting the performance and quality-of-service (QoS) significantly. In this environment, efficient resource sharing and a guarantee of a certain level of performance is highly desirable. Researchers have proposed different frameworks for providing QoS. Most of these frameworks focus on individual resource for QoS management. Coordinated management of multiple QoS-aware shared resources at runtime remains an open problem. Recently, there has been work that proposed a class-of-serviced based framework to jointly managing cache, NoC and memory resources simultaneously. However, the work allocates shared resources statically at the beginning of application runtime, and do not dynamically track, manage and share shared resources across applications. In this article, we address this limitation by proposing dynamic resource management policies that monitor the resource usage of applications at runtime, then steals resources from the high-priority applications for lower-priority ones. The goal is to maintain the targeted level of performance for high-priority applications while improving the performance of lower-priority applications. We use a PI (Proportional-Integral gain) feedback controller based technique to maintain stability in our framework. Our evaluation results show that our policy can improve performance for lower-priority applications significantly while maintaining the performance for high-priority application, thus demonstrating the effectiveness of our dynamic QoS resource management policy.

References

Agarwal, N., Krishna, T., Peh, L.-S., and Jha, N. K. 2009. GARNET: A detailed on-chip network model inside a full-system simulator. In Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 33--42.Google Scholar
Akesson, B. 2010. Predictable and composable system-on-chip memory controllers. Ph.D. thesis, Department of Electrical Engineering, Eindhoven University of Technology.Google Scholar
Akesson, B., Goossens, K., and Ringhofer, M. 2007. Predator: A predictable SDRAM memory controller. In Proceedings of the 5th IEEE/ACM International Conference on Hardware/Software Codesign and System Synthesis. IEEE, 251--256. Google ScholarDigital Library
Bienia, C., Kumar, S., Singh, J. P., and Li, K. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques. 72--81. Google ScholarDigital Library
Bitirgen, R., Ipek, E., and Martinez, J. F. 2008. Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach. In Proceedings of the 2008 41st IEEE/ACM International Symposium on Microarchitecture. 318--329. Google ScholarDigital Library
Bjerregaard, T. and Sparso, J. 2005. A router architecture for connection-oriented service guarantees in the MANGO clockless network-on-chip. In Proceedings of the Conference on Design, Automation and Test in Europe, Vol. 2. 1226--1231. Google ScholarDigital Library
Bolotin, E., Cidon, I., Ginosar, R., and Kolodny, A. 2004. QNoC: QoS architecture and design process for network on chip. J. Syst. Archit. 50, 2--3, 105--128. Google ScholarDigital Library
Cazorla, F. J., Knijnenburg, P. M., Sakellariou, R., Fernández, E., Ramirez, A., and Valero, M. 2004. Predictable performance in SMT processors. In Proceedings of the 1st Conference on Computing Frontiers. 433--443. Google ScholarDigital Library
Das, R., Mutlu, O., Moscibroda, T., and Das, C. R. 2009. Application-aware prioritization mechanisms for on-chip networks. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture.ACM, New York, NY, 280--291. Google ScholarDigital Library
Donald, J. and Martonosi, M. 2006. Techniques for multicore thermal management: Classification and new exploration. SIGARCH Comput. Archit. News 34, 2, 78--88. Google ScholarDigital Library
Ebrahimi, E., Lee, C. J., Mutlu, O., and Patt, Y. N. 2010. Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems. In Proceedings of the 15th International Conferenceon Architectural Support for Programming Languages and Operating Systems (ASPLOS'10). ACM, New York, NY, 335--346. Google ScholarDigital Library
Goossens, K., Dielissen, J., and Radulescu, A. 2005. &Ealig;thereal network on chip: Concepts, architectures, and implementations. IEEE Des. Test Comput. 22, 5, 414--421. Google ScholarDigital Library
Grot, B., Hestness, J., Keckler, S. W., and Mutlu, O. 2011. Kilo-NOC: a heterogeneous network-on-chip architecture for scalability and service guarantees. In Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA'11). ACM, New York, NY, 401--412. Google ScholarDigital Library
Grot, B., Keckler, S. W., and Mutlu, O. 2009. Preemptive virtual clock: a flexible, efficient, and cost-effective qos scheme for networks-on-chip. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. ACM, New York, NY, 268--279. Google ScholarDigital Library
Guo, F., Solihin, Y., Zhao, L., and Iyer, R. 2007. A framework for providing quality of service in chip multi-processors. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture. 343--355. Google ScholarDigital Library
Hansson, A., Goossens, K., Bekooij, M., and Huisken, J. 2009. CoMPSoC: A template for composable and predictable multi-processor system on chips. ACM Trans. Des. Autom. Electron. Syst. 14, 1, 1--24. Google ScholarDigital Library
Intel Corporation. 2009. Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3B: System Programming Guide, Part 2. http://www.intel.com/Assets/PDF/manual/253669.pdf.Google Scholar
Ipek, E., Mutlu, O., Martínez, J. F., and Caruana, R. 2008. Self-optimizing memory controllers: A reinforcement learning approach. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA'08). IEEE, 39--50. Google ScholarDigital Library
Iyer, R., Zhao, L., Guo, F., Illikkal, R., Makineni, S., Newell, D., Solihin, Y., Hsu, L., and Reinhardt, S. 2007. QoS policies and architecture for cache/memory in CMP platforms. In Proceedings of the ACM SIGMETRICS Conference. 25--36. Google ScholarDigital Library
Kahng, A., Li, B., Peh, L.-S., and Samadi, K. 2009. ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration. In Proceedings of the Conference on Design Automation and Test in Europe (DATE). 423--428. Google ScholarDigital Library
Kim, S., Chandra, D., and Solihin, Y. 2004. Fair cache sharing and partitioning in a chip multiprocessor architecture. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques. 111--122. Google ScholarDigital Library
Kim, Y., Han, D., Mutlu, O., and Harchol-Balter, M. 2010a. Atlas: A scalable and high-performance scheduling algorithm for multiple memory controllers. In Proceedings of the 16th International Symposium on High-Performance Computer Architecture. 1--12.Google Scholar
Kim, Y., Papamichael, M., Mutlu, O., and Harchol-Balter, M. 2010b. Thread cluster memory scheduling: Exploiting differences in memory access behavior. In Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'43.) IEEE, 65--76. Google ScholarDigital Library
Kuo, B. C. and Golnaraghi, F. 2003. Automatic Control Systems, 8th ed. John Wiley and Sons, Inc., New York. Google ScholarDigital Library
Lee, J. W., Ng, M. C., and Asanovic, K. 2008. Globally-synchronized frames for guaranteed quality-of-service in on-chip networks. In Proceedings of 35th International Symposium on Computer Architecture. 89--100. Google ScholarDigital Library
Li, B., Zhao, L., Iyer, R., Peh, L.-S., Leddige, M., Espig, M., Lee, S. E., and Newell, D. 2011. CoQoS: Coordinating QoS-aware shared resources in NoC-based SoCs. J. Paral. Distrib. Comput. 71, 5, 700--713. Google ScholarDigital Library
Millberg, M., Nilsson, E., Thid, R., and Jantsch, A. 2004. Guaranteed bandwidth using looped containers in temporally disjoint networks within the Nostrum network on chip. In Proceedings of the Conference on Design, Automation and Test in Europe, Vol. 2. 890--895. Google ScholarDigital Library
Moses, J., Illikkal, R., Iyer, R., Huggahalli, R., and Newell, D. 2004. ASPEN: Towards effective simulation of threads and engines in evolving platforms. In Proceedings of the 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems. 51--58. Google ScholarDigital Library
Mutlu, O. and Moscibroda, T. 2007. Stall-time fair memory access scheduling for chip multiprocessors. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture. 146--160. Google ScholarDigital Library
Mutlu, O. and Moscibroda, T. 2008. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA'08). IEEE Computer Society, 63--74. Google ScholarDigital Library
Nesbit, K., Moreto, M., Cazorla, F., Ramirez, A., Valero, M., and Smith, J. 2008. Multicore resource management. IEEE Micro 28, 3, 6--16. Google ScholarDigital Library
Nesbit, K. J., Aggarwal, N., Laudon, J., and Smith, J. E. 2006. Fair queuing memory systems. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. 208--222. Google ScholarDigital Library
Nesbit, K. J., Laudon, J., and Smith, J. E. 2007. Virtual private caches. In Proceedings of the 34th Annual International Symposium on Computer Architecture. 57--68. Google ScholarDigital Library
Qureshi, M. K. and Patt, Y. N. 2006. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 423--432. Google ScholarDigital Library
Rafique, N., Lim, W.-T., and Thottethodi, M. 2006. Architectural support for operating system-driven CMP cache management. In Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques. 2--12. Google ScholarDigital Library
SAP America Inc. 2008. SAP standard benchmarks. http://www.sap.com/solutions/benchmark/index.epx.Google Scholar
Sharifi, A., Srikantaiah, S., Mishra, A. K., Kandemir, M., and Das, C. R. 2011. METE: meeting end-to-end qos in multicores through system-wide resource management. SIGMETRICS Perform. Eval. Rev. 39, 13--24. Google ScholarDigital Library
Skadron, K., Abdelzaher, T., and Stan, M. R. 2002. Control-theoretic techniques and thermal-RC modeling for accurate and localized dynamic thermal management. In Proceedings of the 8th International Symposium on High-Performance Computer Architecture. 17--28. Google ScholarDigital Library
Srikantaiah, S., Kandemir, M., and Wang, Q. 2009. SHARP control: Controlled shared cache management in chip multiprocessors. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. 517--528. Google ScholarDigital Library
Standard Performance Evaluation Corporation. 2004. SPECjAppServer java application server benchmark. http://www.spec.org/jAppServer.Google Scholar
Standard Performance Evaluation Corporation 2005. SPECjbb2005. http://www.spec.org/jbb2005.Google Scholar
Standard Performance Evaluation Corporation 2006. SPECCPU2006. http://www.spec.org/cup2006.Google Scholar
Suh, G. E., Devadas, S., and Rudolph, L. 2001. Analytical cache models with applications to cache partitioning. In Proceedings of the 15th International Conference on Supercomputing. 1--12. Google ScholarDigital Library
Suh, J. and Dubois, M. 2009. Dynamic MIPS rate stabilization in out-of-order processors. SIGARCH Comput. Archit. News 37, 3, 46--56. Google ScholarDigital Library
TPC. 2009. TPC-C design document. http://www.tpc.org/tpcc.Google Scholar
Varma, A., Ganesh, B., Sen, M., Choudhury, S. R., Srinivasan, L., and Bruce, J. 2003. A controltheoretic approach to dynamic voltage scheduling. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems. 255--266. Google ScholarDigital Library
Weber, W.-D., Chou, J., Swarbrick, I., and Wingard, D. 2005. A quality-of-service mechanism for interconnection networks in system-on-chips. In Proceedings of the Conference on Design, Automation and Test in Europe, Vol. 2. 1232--1237. Google ScholarDigital Library
Wu, C.-J. and Martonosi, M. 2011. Adaptive timekeeping replacement: Fine-grained capacity management for shared CMP caches. ACM Trans. Archit. Code Optim. 8, 1. Google ScholarDigital Library
Wu, Q., Juang, P., Martonosi, M., Peh, L.-S., and Clark, D. W. 2005. Formal control techniques for power-performance management. IEEE Micro 25, 5, 52--62. Google ScholarDigital Library
Yeh, T. Y. and Reinman, G. 2005. Fast and fair: data-stream quality of service. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems. 237--248. Google ScholarDigital Library

Index Terms

Dynamic QoS management for chip multiprocessors

Recommendations

NoC-aware cache design for multithreaded execution on tiled chip multiprocessors
HiPEAC '11: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers

In chip multiprocessors (CMPs), data access latency depends on the memory hierarchy organization, the on-chip interconnect (NoC), and the running workload. Reducing data access latency is vital to achieving performance improvements and scalability of ...
Read More
NoC-aware cache design for chip multiprocessors
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniques

The performance of chip multiprocessors (CMPs) is dependent on the data access latency, which is highly dependent on the design of the on-chip interconnect (NoC) and the organization of the memory caches. However, prior research attempts to optimize the ...
Read More
Quality of service shared cache management in chip multiprocessor architecture

The trends in enterprise IT toward service-oriented computing, server consolidation, and virtual computing point to a future in which workloads are becoming increasingly diverse in terms of performance, reliability, and availability requirements. It can ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Architecture and Code Optimization Volume 9, Issue 3
September 2012
313 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/2355585
Issue’s Table of Contents

Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 October 2012
- Accepted: 1 March 2012
- Revised: 1 November 2011
- Received: 1 November 2009
Published in taco Volume 9, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Cache
joint resource management
network-on-chip(NoC)
quality-of-service(QoS)
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 11
  Total Citations
  View Citations
- 702
  Total Downloads
- Downloads (Last 12 months)43
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Dynamic QoS management for chip multiprocessors

ACM Transactions on Architecture and Code Optimization

Abstract

References

Cited By

Index Terms

Recommendations

NoC-aware cache design for multithreaded execution on tiled chip multiprocessors

NoC-aware cache design for chip multiprocessors

Quality of service shared cache management in chip multiprocessor architecture

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Dynamic QoS management for chip multiprocessors

ACM Transactions on Architecture and Code Optimization

Abstract

References

Cited By

Index Terms

Recommendations

NoC-aware cache design for multithreaded execution on tiled chip multiprocessors

NoC-aware cache design for chip multiprocessors

Quality of service shared cache management in chip multiprocessor architecture

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media