Read more here.
In the previous post, we saw an introduction to Apache Storm, it’s characteristics and different use cases. Now, let us take a look at the Apache Storm architecture below – Reference A storm cluster is similar to a Hadoop cluster. In a Hadoop cluster, we run what is called as MapReduce jobs, but in Storm we… Continue reading How does Apache Storm work?
The ability to extract, transform and process real-time data is critical today. In the beginning, there wasn’t any support for real-time data processing. But, with the demand and constant technological progress, there are quite a few technologies now that support real-time data processing. Apache Storm is one of them. Apache Storm is a system for… Continue reading What is Apache Storm?
To know more, read the complete post here.
MongoDB is one of the leading NoSQL database. It’s a new generation of database. In the past, web applications have used relational databases to store data. But, with growing data, scalability and availability of data are some of the main concerns of any web application (or an organization). MongoDB was designed with web applications in mind.… Continue reading An introduction to MongoDB
So, what has changed in the past few years? The volume of data has increased tremendously. The kind of data we are dealing with has changed. We no longer have the plain-old text format to deal with. We have audio, videos, images, and other complex formats of data that needs to be dealt with. Number of users… Continue reading What is NoSQL?
Apache Spark is another project of Apache that offers parallel data processing and which can work with Hadoop to develop Big Data applications. It is a fast and general engine for large-scale data processing. Let us look at some of the features of Apache Spark one by one – Real Time Processing Unlike Map-Reduce, Spark can handle… Continue reading 5 awesome features of Apache Spark
Fair Scheduler is a pluggable scheduler for Hadoop that allows YARN applications to share resources in large clusters fairly. As the name suggests, it allocates resources such that all applications get an equal share. By default, this is done on the basis of the memory. But, i t can be configured to schedule with both memory… Continue reading What is a Fair Scheduler in Hadoop?
Capacity Scheduler is a pluggable scheduler for Hadoop which allows for multiple-tenants to securely share a large cluster such that their applications are allocated resources in a timely manner under constraints of allocated capacities. Before the onset of Big Data and Hadoop, every organization had their own set of resources that had sufficient capacity to meet the… Continue reading What is a Capacity Scheduler in Hadoop?
Well, we know that HiveQL is very similar to SQL. The detailed Hive Language Manual describes all the important functions and semantics that are used in Hive. Please find below some examples to get you started on Hive. If you have been using SQL in the past and are familiar with the semantics, see how you can use… Continue reading SQL and HiveQL query examples