Tuesday Big Data Series

What is a Fair Scheduler in Hadoop?

Fair Scheduler is a pluggable scheduler for Hadoop that allows YARN applications to share resources in large clusters fairly. As the name suggests, it allocates resources such that all applications get an equal share. By default, this is done on the basis of the memory. But, i t can be configured to schedule with both memory… Continue reading What is a Fair Scheduler in Hadoop?

Monday Technology Series

Understanding YARN and it’s components

Apache Hadoop YARN (Yet Another Resource Negotiator) is a cluster management technology. It decouples MapReduce’s resource management and scheduling capabilities from the data processing component, enabling Hadoop to support more varied processing approaches and a broader array of applications. YARN, (or sometimes called as MR2), is an extended and an improved version of MR1. It was… Continue reading Understanding YARN and it’s components

Friday "Term of the week" Series

Term of the Week : Oozie

Oozie, an open source project, is implemented as a Java web application that runs in a Java servlet container and is distributed under the Apache License 2.0. It is a workflow scheduling system to manage Hadoop jobs. Oozie combines multiple jobs sequentially into one logical unit of work. It is integrated with the Hadoop stack, with YARN… Continue reading Term of the Week : Oozie

Thursday Management Series

5 reasons why a positive feedback is equally important

Being a part of any organization, all of us have attended the feedback sessions. These sessions are generally conducted by the immediate managers who tell you about how your performance has been so far and what is expected from you in the future. Some of the managers misconstrue the concept of effective feedback and focus only on the negative… Continue reading 5 reasons why a positive feedback is equally important

Tuesday Big Data Series

What is a Capacity Scheduler in Hadoop?

Capacity Scheduler is a pluggable scheduler for Hadoop which allows for multiple-tenants to securely share a large cluster such that their applications are allocated resources in a timely manner under constraints of allocated capacities. Before the onset of Big Data and Hadoop, every organization had their own set of resources that had sufficient capacity to meet the… Continue reading What is a Capacity Scheduler in Hadoop?

Monday Technology Series

What is Apache Sqoop?

Apache Sqoop is a tool designed for transferring bulk data between Apache Hadoop and relational databases. It supports incremental loads of a single table or a free form SQL query as well as saved jobs which can be run multiple times to import updates made to a database since the last import. Imports can also be… Continue reading What is Apache Sqoop?

Friday "Term of the week" Series

Term of the Week : Apache Tez

The Apache Tez project is an extensible framework built on top of Apache Hadoop YARN. It is used to process data, that earlier took multiple MR jobs, now in a single Tez job which uses Directed Acyclic Graph (DAG) for data processing. It is used for building high performance batch and interactive data processing applications.… Continue reading Term of the Week : Apache Tez

Thursday Management Series

Guide to one-on-one meetings – For Manager and Sub-ordinate

One-on-one meetings are an important platform for communication between the manager and the sub-ordinate. The purpose of one-on-one meetings for manager and the employee differs, but is equally important for both. As a manager, your purpose is to – Track performance of the employee. Give value-added feedback. Find out more about the employee; whether he is happy,… Continue reading Guide to one-on-one meetings – For Manager and Sub-ordinate

Wednesday Marketing Series

How to use Inbound Marketing?

Inbound Marketing or Online Marketing is the new buzz. Every company these days has a social media or a heavy online presence that they use to market their products or services. This kind of marketing pulls in and attracts people who might be interested in their product.  It lets people engage with the brand thus… Continue reading How to use Inbound Marketing?

Tuesday Big Data Series

SQL and HiveQL query examples

Well, we know that HiveQL is very similar to SQL. The detailed Hive Language Manual describes all the important functions and semantics that are used in Hive. Please find below some examples to get you started on Hive. If you have been using SQL in the past and are familiar with the semantics, see how you can use… Continue reading SQL and HiveQL query examples