In the previous post, we saw an introduction to Apache Storm, it’s characteristics and different use cases. Now, let us take a look at the Apache Storm architecture below – Reference A storm cluster is similar to a Hadoop cluster. In a Hadoop cluster, we run what is called as MapReduce jobs, but in Storm we… Continue reading How does Apache Storm work?

Tuesday Big Data Series

What is Apache Storm?

July 12, 2016July 9, 2016 Garshita Gupta1 Comment

The ability to extract, transform and process real-time data is critical today. In the beginning, there wasn’t any support for real-time data processing. But, with the demand and constant technological progress, there are quite a few technologies now that support real-time data processing. Apache Storm is one of them. Apache Storm is a system for… Continue reading What is Apache Storm?

Tuesday Big Data Series

INFOGRAPHIC – Apache Pig

July 5, 2016June 23, 2016 Garshita GuptaLeave a comment

To know more, read the complete post here.

Tuesday Big Data Series

An introduction to MongoDB

June 28, 2016June 8, 2016 Garshita GuptaLeave a comment

MongoDB is one of the leading NoSQL database. It’s a new generation of database. In the past, web applications have used relational databases to store data. But, with growing data, scalability and availability of data are some of the main concerns of any web application (or an organization). MongoDB was designed with web applications in mind.… Continue reading An introduction to MongoDB

Tuesday Big Data Series

What is NoSQL?

June 21, 2016June 8, 2016 Garshita GuptaLeave a comment

So, what has changed in the past few years? The volume of data has increased tremendously. The kind of data we are dealing with has changed. We no longer have the plain-old text format to deal with. We have audio, videos, images, and other complex formats of data that needs to be dealt with. Number of users… Continue reading What is NoSQL?

Tuesday Big Data Series

What is a Fair Scheduler in Hadoop?

June 7, 2016May 22, 2016 Garshita GuptaLeave a comment

Fair Scheduler is a pluggable scheduler for Hadoop that allows YARN applications to share resources in large clusters fairly. As the name suggests, it allocates resources such that all applications get an equal share. By default, this is done on the basis of the memory. But, i t can be configured to schedule with both memory… Continue reading What is a Fair Scheduler in Hadoop?

Tuesday Big Data Series

What is a Capacity Scheduler in Hadoop?

May 31, 2016May 19, 2016 Garshita GuptaLeave a comment

Capacity Scheduler is a pluggable scheduler for Hadoop which allows for multiple-tenants to securely share a large cluster such that their applications are allocated resources in a timely manner under constraints of allocated capacities. Before the onset of Big Data and Hadoop, every organization had their own set of resources that had sufficient capacity to meet the… Continue reading What is a Capacity Scheduler in Hadoop?

Tuesday Big Data Series

SQL and HiveQL query examples

May 24, 2016May 2, 2016 Garshita GuptaLeave a comment

Well, we know that HiveQL is very similar to SQL. The detailed Hive Language Manual describes all the important functions and semantics that are used in Hive. Please find below some examples to get you started on Hive. If you have been using SQL in the past and are familiar with the semantics, see how you can use… Continue reading SQL and HiveQL query examples

Tuesday Big Data Series

Hive or Pig?

May 17, 2016May 2, 2016 Garshita GuptaLeave a comment

Which one of them is your favorite – Hive or Pig? What do you prefer to work with? People often confuse as to when to use Hive and when to use Pig. And, while in most of the cases, either of it can be used, the question that arises is why both of them exist in… Continue reading Hive or Pig?

MarketEng

… marketing engineered

Tag: Tuesday Big Data Series

INFOGRAPHIC – HDFS and it’s features that make it so awesome!

How does Apache Storm work?

What is Apache Storm?

INFOGRAPHIC – Apache Pig

An introduction to MongoDB

What is NoSQL?

What is a Fair Scheduler in Hadoop?

What is a Capacity Scheduler in Hadoop?

SQL and HiveQL query examples

Hive or Pig?

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: