So, we have all the data collected from various sources like logs from the web, social media sites, user based interaction logs, etc. Marketers (or businesses) want to turn this real and growing data into an opportunity. We want to be able to –
- Study or analyze this raw data to extract information and gain insights.
- Uncover the trends to study user behavior.
- Create models to predict future customer behavior.
- Target the right customer at the right time.
- And, be able to do all of this, really FAST!
The challenge with processing big data lies in the fact that how quickly we can convert the user behavior/action into a potential customer. Can the traditional processing methods handle such huge amounts of data? Can they process the data fast enough? The answer is NO. Big data systems need parallel processing of data that can help turn around results quickly.
Processing data piece by piece takes a lot of time. It could take days to run a simple query on the kind of data (Big data!) we are talking about. The advantage of using parallel processing in such scenarios is that it can break the problem into small chunks, which can then be solved simultaneously.
A simple analogy could be this: Say, you have to count the occurrence of the word “beautiful” in a book with 20 pages. If one person is assigned this task, he will have to go through each page and keep a count of each occurrence. Instead, if this problem is broken down i.e. 20 pages are distributed among 20 different people, where each one of them can simultaneously count the occurrence of the word on the page assigned to them. Individual counts can then be collected and summed up to get the total occurrence. Breaking the problem into smaller chunks and processing it simultaneously leads to better performance and faster results.
Parallel Processing in Hadoop
Hadoop uses a parallel programming model called the MapReduce that divides large volumes of data into set of independent tasks. Instead of dividing or distributing the data across machines, it distributes the computation. It divides larger task into smaller tasks, distributes these small tasks across the cluster and then combines the result from all these small tasks to produce the final result. A very interesting video explaining the concept of MapReduce is created and shared by Jesse Anderson, which can be viewed here.
As a marketer, we need better insights to improve customer conversion rates, personalize marketing campaigns, predict customer’s future behavior, acquire new customers and retain the existing ones. These trends have to be analyzed and put into execution before the data gets stale. For example, an abandoned cart on a shopping website gives us the scope for retargeting but this has to be done while the user is still interested in buying that product.
By using big data solutions to process the data in parallel and by creating effective marketing campaigns on the processed data will not only target the right customer at the right time but, will yield faster results and generate higher customer conversion rates.