Tuesday Big Data Series

What is a Capacity Scheduler in Hadoop?

Capacity Scheduler is a pluggable scheduler for Hadoop which allows for multiple-tenants to securely share a large cluster such that their applications are allocated resources in a timely manner under constraints of allocated capacities.

Before the onset of Big Data and Hadoop, every organization had their own set of resources that had sufficient capacity to meet the organization’s SLA under peak or near peak conditions. This would lead to poor utilization of resources and an overhead of managing multiple independent clusters, one per organization. If the organizations wanted to incorporate Hadoop, then maintaining their own private Hadoop network can be very expensive and might even be under-utilized at times. To overcome this, instead of creating their own private clusters, organizations started sharing clusters. This way they would use the benefits like scalability that Hadoop promises and also be cost-effective.

But, sharing of clusters could lead to a problem – sharing of resources that are critical for their SLA’s. Let us assume, A has a revenue impacting job running on a cluster which impacts millions of users. Now, B wants to do some research and analysis and has started a resource intensive job on the cluster. If the resources are not managed efficiently, it is quite possible that B would be consuming some of the resources of A which could be fatal for A’s job. To overcome such scenarios, Capacity Scheduler was designed to allow sharing a large cluster while giving each organization capacity guarantees. All the resources in the Hadoop cluster are shared among multiple organizations who collectively fund the cluster based on their computing needs. There is an added benefit that an organization can access any excess capacity not being used by others. This provides elasticity for the organizations in a cost-effective manner.

Some of the key features of Capacity Scheduler are – 

  • It supports Multi-Tenancy. Capacity Scheduler has some defined limits that are set to ensure that no organization or user or queue is using up extra resources. Since, these clusters support multi-tenancy, it makes sure that no single job is using up the entire network.
  • To support sharing of clusters, Capacity Scheduler uses the concept of Queues. In fact, it supports hierarchical queues to ensure resources are shared among the sub-queues of an organization before other queues are allowed to use free resources, there-by providing affinity for sharing free resources among applications of a given organization.
  • It offers capacity guarantees by allocating queues with a fraction of capacity of the grid. So, at a time, only the allocated amount of resources will be used by that job using that particular queue.
  • It offers security. Users are assigned queues and each queue has strict ACLs which controls which users can submit applications to individual queues. Also, there are safe-guards to ensure that users cannot view and/or modify applications from other users.
  • It offers elasticity. Free resources can be allocated to any queue beyond it’s capacity. When there is demand for these resources from queues running below capacity at a future point in time, as tasks scheduled on these resources complete, they will be assigned to applications on queues running below the capacity (pre-emption is not supported).
  • It supports resource-intensive applications, where-in a application can optionally specify higher resource-requirements than the default, there-by accommodating applications with differing resource requirements. Currently, memory is the the resource requirement supported.
  • It allows users to map a job to a specific queue based on the user or group.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s