Tuesday Big Data Series

What is a Fair Scheduler in Hadoop?

Fair Scheduler is a pluggable scheduler for Hadoop that allows YARN applications to share resources in large clusters fairly. As the name suggests, it allocates resources such that all applications get an equal share. By default, this is done on the basis of the memory. But, i t can be configured to schedule with both memory and CPU.

Some of the key features of Fair Scheduling are – 

  • Consider that a single application A is running on a cluster. In case of fair scheduling, it uses up all the resources as it’s the only one running.  Now, a second application B is submitted. Now, the resources that are freed up are assigned to this new application while fairly diving the resources between A and B.
  • Unlike the default Hadoop scheduler, which forms a queue of applications, this lets short applications finish in reasonable time while not starving long-lived applications.
  • Fair sharing can also work with application priorities – the priorities are used as weights to determine the fraction of total resources that each app should get.
  • It organizes applications further into “queues”, and shares resources fairly between these queues. Within each queue, a scheduling policy is used to share resources between the running applications. The default is memory-based fair sharing, but FIFO and multi-resource with Dominant Resource Fairness can also be configured.
  • It assigns guaranteed minimum shares to queues, which is useful for ensuring that certain users, groups or production applications always get sufficient resources. When a queue contains applications, it gets at least its minimum share, but when the queue does not need its full guaranteed share, the excess is split between other running applications. This lets the scheduler guarantee capacity for queues while utilizing resources efficiently when these queues don’t contain applications.
  • It lets all applications run by default. But, it is possible to limit the number of running applications per user and per queue through the config file. Limiting the applications do not cause them to fail. They are just added to the queue and will start once the resources become available.
  • The fair scheduler supports hierarchical queues. All queues descend from a queue named “root”. Available resources are distributed among the children of the root queue in the typical fair scheduling fashion. Then, the children distribute the resources assigned to them to their children in the same fashion. Applications may only be scheduled on leaf queues. Queues can be specified as children of other queues by placing them as sub-elements of their parents in the fair scheduler allocation file.
  • It also allows setting a different custom policy for each queue to allow sharing the queue’s resources in any which way the user wants. By default, it uses memory based scheduling. But, it can be customized to use FIFO or Dominant Resource Fairness Policy.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s