skip to main content

Virtual hierarchies to support server consolidation

Published: 09 June 2007 Publication History


Server consolidation is becoming an increasingly popular technique to manage and utilize systems. This paper develops CMP memory systems for server consolidation where most sharing occurs within Virtual Machines (VMs). Our memory systems maximize shared memory accesses serviced within a VM, minimize interference among separate VMs, facilitate dynamic reassignment of VMs to processors and memory, and support content-based page sharing among VMs. We begin with a tiled architecture where each of 64 tiles contains a processor, private L1 caches, and an L2 bank. First, we reveal why single-level directory designs fail to meet workload consolidation goals. Second, we develop the paper's central idea of imposing a two-level virtual (or logical) coherence hierarchy on a physically flat CMP that harmonizes with VM assignment. Third, we show that the best of our two virtual hierarchy (VH) variants performs 12-58% better than the best alternative flat directory protocol when consolidating Apache, OLTP, and Zeus commel workloads on our simulated 64-core CMP.


A. R. Alameldeen and D. A. Wood. Variability in Architectural Simulations of Multi-threaded Workloads. In Proceedings of the Ninth IEEE Symposium on High-Performance Computer Architecture, pages 7--18, Feb. 2003.
A. R. Alameldeen and D. A. Wood. IPC Considered Harmful for Multiprocessor Workloads. IEEE Micro, 26(4):8--17, Jul/Aug 2006.
AMD. AMD64 Virtualization Codenamed Pacifica Technology: Secure Virtual Machine Architecture Reference Manual, May 2005.
L. A. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing. In Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 282--293, June 2000.
B. M. Beckmann, M. R. Marty, and D. A. Wood. ASR: Adaptive Selective Replication for CMP Caches. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, Dec. 2006.
B. M. Beckmann and D. A. Wood. Managing Wire Delay in Large Chip-Multiprocessor Caches. In Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture, Dec. 2004.
E. Bugnion, S. Devine, K. Govil, and M. Rosenblum. Disco: Running Commodity Operating Systems on Scalable Multiprocessors. ACM Transactions on Computer Systems, 15(4):319--349, 1997.
J. Chang and G. S. Sohi. Cooperative Caching for Chip Multiprocessors. In Proceedings of the 33nd Annual International Symposium on Computer Architecture, June 2006.
A. Charlesworth. Starfire: Extending the SMP Envelope. IEEE Micro, 18(1):39--49, Jan/Feb 1998.
D. Chiou, P. Jain, S. Devadas, and L. Rudolph. Dynamic Cache Partitioning via Columnization. In Proceedings of Design Automation Conference, 2000.
Z. Chishti, M. D. Powell, and T. N. Vijaykumar. Optimizing Replication, Communication, and Capacity Allocation in CMPs. In Proceedings of the 32nd Annual International Symposium on Computer Architecture, June 2005.
S. Cho and L. Jin. Managing Distributed, Shared L2 Caches through OS-Level Page Allocation. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, Dec. 2006.
A. Gupta and W.-D. Weber. Cache Invalidation Patterns in Shared-Memory Multiprocessors. IEEE Transactions on Computers, 41(7):794--810, July 1992.
A. Gupta, W.-D. Weber, and T. Mowry. Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes. In International Conference on Parallel Processing (ICPP), volume I, pages 312--321, 1990.
E. Hagersten and M. Koster. WildFire: A Scalable Path for SMPs. In Proceedings of the Fifth IEEE Symposium on High-Performance Computer Architecture, pages 172--181, Jan. 1999.
HP Partioning Continuum., June 2000.
L. R. Hsu, S. K. Reinhardt, R. Iyer, and S. Makineni. Communist, Utilitarian, and Capitalist Cache Policies on CMPs: Caches as a Shared Resource. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, Sept. 2006.
J. Huh, C. Kim, H. Shafi, L. Zhang, D. Burger, and S. W. Keckler. A NUCA Substrate for Flexible CMP Cache Sharing. In Proceedings of the 19th International Conference on Supercomputing, June 2005.
From a Few Cores to Many: A Tera-scale Computing Research Overview., 2006.
Intel Corporation. Intel Virtualization Technology Specifications for the IA-32 Intel Architecture, 2005.
R. Iyer. CQoS: A Framework for Enabling QoS in Shared Caches of CMP Platforms. In Proceedings of the 18th International Conference on Supercomputing, pages 257--266, 2004.
J. Jann, L. M. Browning, and R. S. Burugula. Dynamic reconfiguration: Basic building blocks for autonomic computing on IBM pSeries servers. IBM Systems Journal, 42(1), 2003.
C. Kim, D. Burger, and S. W. Keckler. An Adaptive, Non-Uniform Cache Structure for Wire-Dominated On-Chip Caches. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Oct. 2002.
S. Kim, D. Chandra, and Y. Solihin. Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, Sept. 2004.
P. Kongetira. A 32-way Multithreaded SPARC Processor. In Proceedings of the 16th HotChips Symposium, Aug. 2004.
J. Laudon and D. Lenoski. The SGI Origin: A ccNUMA Highly Scalable Server. In Proceedings of the 24th Annual International Symposium on Computer Architecture, pages 241--251, June 1997.
D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 148--159, May 1990.
C. Liu, A. Savasubramaniam, and M. Kandemir. Organizing the Last Line of Defense before Hitting the Memory Wall for CMPs. In Proceedings of the Tenth IEEE Symposium on High-Performance Computer Architecture, Feb. 2004.
M. M. K. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu, A. R. Alameldeen, K. E. Moore, M. D. Hill, and D. A. Wood. Multifacet's General Execution-driven Multiprocessor Simulator (GEMS) Toolset. Computer Architecture News, pages 92--99, Sept. 2005.
M. M. K. Martin, M. D. Hill, and D. A. Wood. Token Coherence: Decoupling Performance and Correctness. In Proceedings of the 30th Annual International Symposium on Computer Architecture, pages 182--193, June 2003.
M. R. Marty, J. D. Bingham, M. D. Hill, A. J. Hu, M. M. K. Martin, and D. A. Wood. Improving Multiple-CMP Systems Using Token Coherence. In Proceedings of the Eleventh IEEE Symposium on High-Performance Computer Architecture, Feb. 2005.
M. R. Marty and M. D. Hill. Coherence Ordering for Ring-based Chip Multiprocessors. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, Dec. 2006.
N. Rafique, W.-T. Lim, and M. Thottethodi. Architectural Support for Operating System-Driven CMP Cache Management. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, Sept. 2006.
P. Ranganathan, S. Adve, and N. P. Jouppi. Reconfigurable Caches and their Application to Media Processing. In Proceedings of the 27th Annual International Symposium on Computer Architecture, June 2000.
S. L. Scott. Synchronization and Communication in the Cray T3E Multiprocessor. In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, pages 26--36, Oct. 1996.
J. E. Smith and R. Nair. Virtual Machines. Morgan Kaufmann, 2005.
G. E. Suh, S. Devadas, and L. Rudolph. A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning. In Proceedings of the Eighth IEEE Symposium on High-Performance Computer Architecture, Feb. 2002.
G. E. Suh, L. Rudolph, and S. Devadas. Dynamic Cache Partitioning for CMP/SMT Systems. Journal of Supercomputing, pages 7--26, 2004.
J. M. Tendler, S. Dodson, S. Fields, H. Le, and B. Sinharoy. POWER4 System Microarchitecture. IBM Journal of Research and Development, 46(1), 2002.
K. Varadarajan, S. K. Nandy, V. Sharda, A. Bharadwaj, R. Iyer, S. Makineni, and D. Newell. Molecular Caches: A Caching Structure for Dynamic Creation of Application-Specific Heterogeneous Cache Regions. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, Dec. 2006.
Virtutech AB. Simics Full System Simulator.
C. A. Waldspurger. Memory Resource Management in VMware ESX Server. In Proceedings of the 2002 Symposium on Operating Systems Design and Implementation, Dec. 2002.
M. Zhang and K. Asanovic. Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors. In Proceedings of the 32nd Annual International Symposium on Computer Architecture, June 2005.

Cited By

View all
  • (2019)Linear Time Algorithms for Multiple Cluster Scheduling and Multiple Strip PackingEuro-Par 2019: Parallel Processing10.1007/978-3-030-29400-7_8(103-116)Online publication date: 26-Aug-2019
  • (2018)Joint Load-Balancing and Energy-Aware Virtual Machine Placement for Network-on-Chip Systems2018 IEEE/ACM 11th International Conference on Utility and Cloud Computing (UCC)10.1109/UCC.2018.00021(124-132)Online publication date: Dec-2018
  • (2016)Performance Analysis of Cache Coherence Protocols for Multi-core ArchitecturesProceedings of the International Conference on Advances in Information Communication Technology & Computing10.1145/2979779.2979801(1-7)Online publication date: 12-Aug-2016
  • Show More Cited By

Index Terms

  1. Virtual hierarchies to support server consolidation



    Information & Contributors


    Published In

    cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 35, Issue 2
    May 2007
    527 pages
    Issue’s Table of Contents
    • cover image ACM Conferences
      ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture
      June 2007
      542 pages
      • General Chair:
      • Dean Tullsen,
      • Program Chair:
      • Brad Calder
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]


    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 June 2007
    Published in SIGARCH Volume 35, Issue 2

    Check for updates

    Author Tags

    1. cache coherence
    2. chip multiprocessors (CMPs)
    3. memory hierarchies
    4. multicore
    5. partitioning
    6. server consolidation
    7. virtual machines


    • Article


    Other Metrics

    Bibliometrics & Citations


    Article Metrics

    • Downloads (Last 12 months)19
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 15 Feb 2025

    Other Metrics


    Cited By

    View all
    • (2019)Linear Time Algorithms for Multiple Cluster Scheduling and Multiple Strip PackingEuro-Par 2019: Parallel Processing10.1007/978-3-030-29400-7_8(103-116)Online publication date: 26-Aug-2019
    • (2018)Joint Load-Balancing and Energy-Aware Virtual Machine Placement for Network-on-Chip Systems2018 IEEE/ACM 11th International Conference on Utility and Cloud Computing (UCC)10.1109/UCC.2018.00021(124-132)Online publication date: Dec-2018
    • (2016)Performance Analysis of Cache Coherence Protocols for Multi-core ArchitecturesProceedings of the International Conference on Advances in Information Communication Technology & Computing10.1145/2979779.2979801(1-7)Online publication date: 12-Aug-2016
    • (2015)On the Feasibility of Side-Channel Attacks in a Virtualized EnvironmentE-Business and Telecommunications10.1007/978-3-319-25915-4_17(319-339)Online publication date: 30-Dec-2015
    • (2013)The McPAT Framework for Multicore and Manycore ArchitecturesACM Transactions on Architecture and Code Optimization10.1145/2445572.244557710:1(1-29)Online publication date: 1-Apr-2013
    • (2013)The Impact of Dynamic Directories on Multicore InterconnectsComputer10.1109/MC.2013.33446:10(32-39)Online publication date: 1-Oct-2013
    • (2013)Rethinking Virtual Machine Interference in the Era of Cloud Applications2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing10.1109/HPCC.and.EUC.2013.36(190-197)Online publication date: Nov-2013
    • (2012)Dynamic directoriesProceedings of the Conference on Design, Automation and Test in Europe10.5555/2492708.2492829(479-484)Online publication date: 12-Mar-2012
    • (2012)Measuring interference between live datacenter applicationsProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/2388996.2389066(1-12)Online publication date: 10-Nov-2012
    • (2012)Measuring interference between live datacenter applicationsProceedings of the 2012 International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2012.78(1-12)Online publication date: 10-Nov-2012
    • Show More Cited By

    View Options

    Login options

    View options


    View or Download as a PDF file.



    View online with eReader.







    Share this Publication link

    Share on social media