Tuesday Big Data Series

INFOGRAPHIC – HDFS and it’s features that make it so awesome!

Read more here.

Tuesday Big Data Series

How does Apache Storm work?

In the previous post, we saw an introduction to Apache Storm, it’s characteristics and different use cases. Now, let us take a look at the Apache Storm architecture below – Reference A storm cluster is similar to a Hadoop cluster. In a Hadoop cluster, we run what is called as MapReduce jobs, but in Storm we… Continue reading How does Apache Storm work?

Tuesday Big Data Series

What is Apache Storm?

The ability to extract, transform and process real-time data is critical today. In the beginning, there wasn’t any support for real-time data processing. But, with the demand and constant technological progress, there are quite a few technologies now that support real-time data processing. Apache Storm is one of them. Apache Storm is a system for… Continue reading What is Apache Storm?

Tuesday Big Data Series

INFOGRAPHIC – Apache Pig

To know more, read the complete post here.

Tuesday Big Data Series

An introduction to MongoDB

MongoDB is one of the leading NoSQL database. It’s a new generation of database. In the past, web applications have used relational databases to store data. But, with growing data, scalability and availability of data are some of the main concerns of any web application (or an organization). MongoDB was designed with web applications in mind.… Continue reading An introduction to MongoDB

Tuesday Big Data Series

What is NoSQL?

So, what has changed in the past few years? The volume of data has increased tremendously. The kind of data we are dealing with has changed. We no longer have the plain-old text format to deal with. We have audio, videos, images, and other complex formats of data that needs to be dealt with. Number of users… Continue reading What is NoSQL?

Tuesday Big Data Series

5 awesome features of Apache Spark

Apache Spark is another project of Apache that offers parallel data processing and which can work with Hadoop to develop Big Data applications. It is a fast and general engine for large-scale data processing. Let us look at some of the features of Apache Spark one by one – Real Time Processing Unlike Map-Reduce, Spark can handle… Continue reading 5 awesome features of Apache Spark

Tuesday Big Data Series

What is a Fair Scheduler in Hadoop?

Fair Scheduler is a pluggable scheduler for Hadoop that allows YARN applications to share resources in large clusters fairly. As the name suggests, it allocates resources such that all applications get an equal share. By default, this is done on the basis of the memory. But, i t can be configured to schedule with both memory… Continue reading What is a Fair Scheduler in Hadoop?

Tuesday Big Data Series

What is a Capacity Scheduler in Hadoop?

Capacity Scheduler is a pluggable scheduler for Hadoop which allows for multiple-tenants to securely share a large cluster such that their applications are allocated resources in a timely manner under constraints of allocated capacities. Before the onset of Big Data and Hadoop, every organization had their own set of resources that had sufficient capacity to meet the… Continue reading What is a Capacity Scheduler in Hadoop?

Tuesday Big Data Series

SQL and HiveQL query examples

Well, we know that HiveQL is very similar to SQL. The detailed Hive Language Manual describes all the important functions and semantics that are used in Hive. Please find below some examples to get you started on Hive. If you have been using SQL in the past and are familiar with the semantics, see how you can use… Continue reading SQL and HiveQL query examples