Hive or Pig?

May 17, 2016May 2, 2016 Garshita GuptaLeave a comment

Which one of them is your favorite – Hive or Pig? What do you prefer to work with? People often confuse as to when to use Hive and when to use Pig. And, while in most of the cases, either of it can be used, the question that arises is why both of them exist in… Continue reading Hive or Pig?

Monday Technology Series

What is Apache Hive?

May 16, 2016May 2, 2016 Garshita GuptaLeave a comment

Apache Hive is a data warehouse infrastructure built on top of Hadoop which allows querying and managing large datasets residing in distributed storage. It provides an SQL-like language called as HiveQL with schema on read and transparently converts queries to map reduce, tez or spark jobs. All these execution engines run on Hadoop YARN. The HiveQL language also… Continue reading What is Apache Hive?

Friday "Term of the week" Series

Term of the Week : HDFS

May 13, 2016May 2, 2016 Garshita GuptaLeave a comment

Image Source HDFS or the Hadoop Distributed File System, is a Java-based filesystem for storing large volumes of data in the Hadoop framework. It solves the problem of storing and managing enormous amounts of data with it’s high scalability, fault-tolerance, high availability and cost efficiency that make it so popular. Related Posts: HDFS and it’s features that make… Continue reading Term of the Week : HDFS

Tuesday Big Data Series

Understanding Pig Data Model

May 10, 2016May 2, 2016 Garshita GuptaLeave a comment

Pig has a simple yet rich data model which consists the following four types: Atom An atom consists of a single atomic value which can be a string or a number. Examples – ‘tom’ or 2 Tuple A tuple is a sequence of fields each of which can be of any datatype. Examples – (‘tom’, ‘california’) or… Continue reading Understanding Pig Data Model

Monday Technology Series

Apache Pig and it’s awesome features!

May 9, 2016May 2, 2016 Garshita Gupta3 Comments

Pig is an open source scripting platform by Apache used for analyzing and processing large data sets. It allows users to write complex map reduce problems using a simple scripting language called Pig Latin. Pig translates the Pig Latin script into MapReduce so that it can be executed within YARN for access to a single dataset stored… Continue reading Apache Pig and it’s awesome features!

Tuesday Big Data Series

Understanding HDFS quotas

May 3, 2016April 5, 2016 Garshita GuptaLeave a comment

Every Hadoop system has an Hadoop Administrator and Hadoop users/developers. The Administrator is responsible for deployment and maintenance of the entire infrastructure. He is responsible for cluster availability, file system management, security, installation of latest updates, and all other things that need to keep the system up and running. The administrator is also responsible for… Continue reading Understanding HDFS quotas

Monday Technology Series

What is Oozie?

May 2, 2016May 2, 2016 Garshita GuptaLeave a comment

Why do we need Oozie? The Hadoop stack consists of a variety of tools like Pig, Map Reduce, Hive, HBase, Sqoop etc. At times, when dealing with large data sets, we might have to use a combination of either of these technologies along with plain old Java, Python, Perl or shell scripts to get work done… Continue reading What is Oozie?

Tuesday Big Data Series

HDFS: Filesystem Metadata and how it persists

April 26, 2016April 1, 2016 Garshita GuptaLeave a comment

In post that explains the HDFS architecture, we saw that HDFS namespace is stored and maintained by NameNode. What is HDFS Namespace? Namespace is a hierarchy of directories, files and blocks in HDFS. It supports file system operations such as creation, modification, deletion and listing of files and directories. One of the important features of HDFS is… Continue reading HDFS: Filesystem Metadata and how it persists

Tuesday Big Data Series

Understanding NameNode and DataNode in HDFS

April 19, 2016April 1, 2016 Garshita GuptaLeave a comment

HDFS has a master/slave architecture and is built-up of basically two kinds of nodes: NameNode (which acts as a Master) and DataNodes (which acts as slaves). NameNode and DataNode are pieces of software that are designed to run on commodity machines which typically run on a GNU/Linux operating system (OS). HDFS is built using Java language,… Continue reading Understanding NameNode and DataNode in HDFS

Tuesday Big Data Series

HDFS and it’s features that make it so awesome!

April 12, 2016April 1, 2016 Garshita GuptaLeave a comment

HDFS, Hadoop Distributed File System, is a Java-based filesystem for storing large volumes of data in the Hadoop framework. When we are thinking of dealing with enormous amounts of data, the first thing that comes to our mind is where do we store this data and how do we store it. We know that every single bit… Continue reading HDFS and it’s features that make it so awesome!

MarketEng

… marketing engineered

Tag: big data

Hive or Pig?

What is Apache Hive?

Term of the Week : HDFS

Understanding Pig Data Model

Apache Pig and it’s awesome features!

Understanding HDFS quotas

What is Oozie?

HDFS: Filesystem Metadata and how it persists

Understanding NameNode and DataNode in HDFS

HDFS and it’s features that make it so awesome!

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: