Apache Hadoop YARN (Yet Another Resource Negotiator) is a cluster management technology. It decouples MapReduce’s resource management and scheduling capabilities from the data processing component, enabling Hadoop to support more varied processing approaches and a broader array of applications. YARN, (or sometimes called as MR2), is an extended and an improved version of MR1. It was… Continue reading Understanding YARN and it’s components
Apache Sqoop is a tool designed for transferring bulk data between Apache Hadoop and relational databases. It supports incremental loads of a single table or a free form SQL query as well as saved jobs which can be run multiple times to import updates made to a database since the last import. Imports can also be… Continue reading What is Apache Sqoop?
The Apache Tez project is an extensible framework built on top of Apache Hadoop YARN. It is used to process data, that earlier took multiple MR jobs, now in a single Tez job which uses Directed Acyclic Graph (DAG) for data processing. It is used for building high performance batch and interactive data processing applications. It drastically improves… Continue reading What is Apache Tez?
Apache Hive is a data warehouse infrastructure built on top of Hadoop which allows querying and managing large datasets residing in distributed storage. It provides an SQL-like language called as HiveQL with schema on read and transparently converts queries to map reduce, tez or spark jobs. All these execution engines run on Hadoop YARN. The HiveQL language also… Continue reading What is Apache Hive?
Pig is an open source scripting platform by Apache used for analyzing and processing large data sets. It allows users to write complex map reduce problems using a simple scripting language called Pig Latin. Pig translates the Pig Latin script into MapReduce so that it can be executed within YARN for access to a single dataset stored… Continue reading Apache Pig and it’s awesome features!
Why do we need Oozie? The Hadoop stack consists of a variety of tools like Pig, Map Reduce, Hive, HBase, Sqoop etc. At times, when dealing with large data sets, we might have to use a combination of either of these technologies along with plain old Java, Python, Perl or shell scripts to get work done… Continue reading What is Oozie?
A bounced email means that the email was never delivered to the recipient. Does it happen? Yes! Does it happen often? Yes, can happen often! Why does it happen? Well, let’s see the reasons- 1. Recipient mailbox is full or has exceeded his quota- An email will bounce until there is space for it in the… Continue reading 7 Reasons why your email may have bounced