Fair Scheduler is a pluggable scheduler for Hadoop that allows YARN applications to share resources in large clusters fairly. As the name suggests, it allocates resources such that all applications get an equal share. By default, this is done on the basis of the memory. But, i t can be configured to schedule with both memory… Continue reading What is a Fair Scheduler in Hadoop?
Capacity Scheduler is a pluggable scheduler for Hadoop which allows for multiple-tenants to securely share a large cluster such that their applications are allocated resources in a timely manner under constraints of allocated capacities. Before the onset of Big Data and Hadoop, every organization had their own set of resources that had sufficient capacity to meet the… Continue reading What is a Capacity Scheduler in Hadoop?
Well, we know that HiveQL is very similar to SQL. The detailed Hive Language Manual describes all the important functions and semantics that are used in Hive. Please find below some examples to get you started on Hive. If you have been using SQL in the past and are familiar with the semantics, see how you can use… Continue reading SQL and HiveQL query examples
Which one of them is your favorite – Hive or Pig? What do you prefer to work with? People often confuse as to when to use Hive and when to use Pig. And, while in most of the cases, either of it can be used, the question that arises is why both of them exist in… Continue reading Hive or Pig?
Pig has a simple yet rich data model which consists the following four types: Atom An atom consists of a single atomic value which can be a string or a number. Examples – ‘tom’ or 2 Tuple A tuple is a sequence of fields each of which can be of any datatype. Examples – (‘tom’, ‘california’) or… Continue reading Understanding Pig Data Model
Every Hadoop system has an Hadoop Administrator and Hadoop users/developers. The Administrator is responsible for deployment and maintenance of the entire infrastructure. He is responsible for cluster availability, file system management, security, installation of latest updates, and all other things that need to keep the system up and running. The administrator is also responsible for… Continue reading Understanding HDFS quotas
In post that explains the HDFS architecture, we saw that HDFS namespace is stored and maintained by NameNode. What is HDFS Namespace? Namespace is a hierarchy of directories, files and blocks in HDFS. It supports file system operations such as creation, modification, deletion and listing of files and directories. One of the important features of HDFS is… Continue reading HDFS: Filesystem Metadata and how it persists