Sunday, 26 April 2015

Evolution of Big Data (Hadoop)

As we are Aware !
Now we are Dealing with

1.Huge Volume of Data from smaller to larger Data sets (i.e GPS Data,weather Data sets,Massive Data  from space satellites to earth station i.e approximately 2GB per minute is generated).
2.Huge Velocity of Data which are generated frequently from (i.e stock markets,Weblogs)
3.Huge Variety of Data which fits not only on a Relational or structured way But also has (Images,streaming videos,emails,social media contents,comments,texts...etc...)
So now we arrive at a point , we need something BIG to store so we need Big Data

Doug Cutting and Mike Caferella and his colleagues involved themselves working based on Google's White paper concepts, later on they worked with Yahoo and developed a technology called Hadoop.

We know how the earlier Linux contributor Linus Torvalds who was the principle force behind the development of Linux Kernel,so that now we have different Linux flavors available to us such as Red Hat Linux,HP-UX,IBM-AIX...

Like wise for Big Data Hadoop we have major players :

Cloudera -Cloudera Distribution Hadoop (CDH)
Hortonworks-Folks who founded this company from Yahoo and Google where they added more features to the Hadoop.
IBM-Big Insights which has their distribution of Hadoop
Also MAP R,EMC,DELL....have their own distribution of Hadoop

Let's see the Apache Hadoop framework
The base Apache Hadoop framework is composed of the following modules:

  • Hadoop Common-contains libraries and utilities needed by other Hadoop modules;
  • Hadoop Distributed File System (HDFS)-a distributed file-system that stores data on commodity machines,providing very high aggregate bandwidth across the cluster;
  • Hadoop YARN- a resource-management platform responsible for managing computing resources in clusters and using them for scheduling of users'applications;and
  • Hadoop MapReduce- a programming model for large scale data processing.


Apache Hadoop's MapReduce and HDFS components were inspired by google papers on their MapReduce and Google File System

Prominent corporate users of Hadoop include Facebook,Yahoo and many others too.It can be deployed in traditional on-site data-centers but has also been implemented in public cloud spaces such as Amazon Web Services,Microsoft Azure,Google Compute Engine. 






No comments:

Post a Comment