Tuesday Big Data Series

What is a Fair Scheduler in Hadoop?

Fair Scheduler is a pluggable scheduler for Hadoop that allows YARN applications to share resources in large clusters fairly. As the name suggests, it allocates resources such that all applications get an equal share. By default, this is done on the basis of the memory. But, i t can be configured to schedule with both memory… Continue reading What is a Fair Scheduler in Hadoop?

Tuesday Big Data Series

What is a Capacity Scheduler in Hadoop?

Capacity Scheduler is a pluggable scheduler for Hadoop which allows for multiple-tenants to securely share a large cluster such that their applications are allocated resources in a timely manner under constraints of allocated capacities. Before the onset of Big Data and Hadoop, every organization had their own set of resources that had sufficient capacity to meet the… Continue reading What is a Capacity Scheduler in Hadoop?

Tuesday Big Data Series

SQL and HiveQL query examples

Well, we know that HiveQL is very similar to SQL. The detailed Hive Language Manual describes all the important functions and semantics that are used in Hive. Please find below some examples to get you started on Hive. If you have been using SQL in the past and are familiar with the semantics, see how you can use… Continue reading SQL and HiveQL query examples

Tuesday Big Data Series

Hive or Pig?

Which one of them is your favorite – Hive or Pig? What do you prefer to work with? People often confuse as to when to use Hive and when to use Pig. And, while in most of the cases, either of it can be used, the question that arises is why both of them exist in… Continue reading Hive or Pig?

Tuesday Big Data Series

Understanding Pig Data Model

Pig has a simple yet rich data model which consists the following four types: Atom An atom consists of a single atomic value which can be a string or a number. Examples – ‘tom’ or 2 Tuple A tuple is a sequence of fields each of which can be of any datatype. Examples – (‘tom’, ‘california’) or… Continue reading Understanding Pig Data Model

Tuesday Big Data Series

Understanding HDFS quotas

Every Hadoop system has an Hadoop Administrator and Hadoop users/developers. The Administrator is responsible for deployment and maintenance of the entire infrastructure. He is responsible for cluster availability, file system management, security, installation of latest updates, and all other things that need to keep the system up and running. The administrator is also responsible for… Continue reading Understanding HDFS quotas