ABSTRACT
Enterprise data centers increasingly adopt a cloud-like architecture that enables the execution of multiple workloads on a shared pool of resources, reduces the data center footprint and drives down the costs. A number of cluster resource managers have appeared over the last few years, aimed at providing a uniform technology-neutral resource representation and management substrate. Examples include Apache YARN, Google Borg and Omega, Apache Mesos, and IBM Platform EGO.
The Apache Mesos project [2] is emerging as a leading open source resource management technology for server clusters. Mesos offers simple yet powerful and flexible APIs, highly available and fault tolerant architecture, scalability to large clusters, isolation between tasks using Linux containers, multi-dimensional resource scheduling, ability to allocate shares of the cluster to roles representing users or user groups, and a clear separation of concerns between the applications (termed frameworks) and the "cluster kernel", which is Mesos. The resource scheduler of Mesos supports a generalization of max-min fairness, termed Dominant Resource Fairness (DRF) [1] scheduling discipline, which allows to harmonize execution of heterogeneous workloads (in terms of resource demand) by maximizing the share of any resource allocated to a specific framework.
However, the default Mesos allocation mechanism lacks a number of policy and tenancy capabilities, important in enterprise deployments. We have investigated integration of Mesos with the IBM EGO (enterprise grid orchestrator) technology [3] which underpins various high performance computing, analytics and big data clusters in a variety of industry verticals including financial services, life sciences, manufacturing and electronics. We have designed and implemented an experimental integration prototype, and have tested it with SparkBench workloads. We demonstrate how Mesos can be enriched with new resource policy capabilities, required for managing enterprise data centers, such as
• Capturing of the hierarchical structure of an enterprise (organisations, departments, groups, teams, users) by defining the corresponding resource consumer tree;
• A fine grained resource plan allowing to define resource share ratio, ownership and lending/borrowing policies for each resource consumer;
• A rich set of resource management policies making use of the hierarchical resource consumer model and providing fairness and isolation to the members of hierarchy including an important ability to dynamically change the allocations (time-based policy);
• A Web-based GUI providing a centralized console through which the whole cluster is observed and managed. In particular, the cluster-wide resource management policies are applied through this GUI.
- A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, and I. Stoica. Dominant resource fairness: Fair allocation of multiple resource types. In NSDI, volume 11, pages 24--24, 2011. Google ScholarDigital Library
- B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. H. Katz, S. Shenker, and I. Stoica. Mesos: A platform for fine-grained resource sharing in the data center. In NSDI, volume 11, pages 22--22, 2011. Google ScholarDigital Library
- IBM Platform Computing. An Introduction to EGO: an enterprise ready resource manager for all workloads, 2015.Google Scholar
Index Terms
- Enterprise Resource Management in Mesos Clusters
Recommendations
Self-adaptive resource management for large-scale shared clusters
In a shared cluster, each application runs on a subset of nodes and these subsets can overlap with one another. Resource management in such a cluster should adaptively change the application placement and workload assignment to satisfy the dynamic ...
Distributed Autonomous Virtual Resource Management in Datacenters Using Finite-Markov Decision Process
SOCC '14: Proceedings of the ACM Symposium on Cloud ComputingTo provide robust infrastructure as a service (IaaS), clouds currently perform load balancing by migrating virtual machines (VMs) from heavily loaded physical machines (PMs) to lightly loaded PMs. Previous reactive load balancing algorithms migrate VMs ...
Workload balancing and adaptive resource management for the swift storage system on cloud
The demand for big data storage and processing has become a challenge in today's industry. To meet the challenge, there is an increasing number of enterprises adopting distributed storage systems. Frequently, in these systems, storage nodes intensively ...
Comments