Skip to main content


Showing posts from August, 2013

Hadoop Ecosystem

When it comes to Hadoop, still some people believe it as a single out of box system catering all big data problems. Unless you are thinking of some third party commercial distribution, this is not correct. In reality, Hadoop on its own is just HDFS and MapReduce . But if you want production ready Hadoop system, then you will have to also consider Hadoop friends (or components) which makes it a complete big data solution.  Most of the components are coming as apache projects but few of them are non-apache open source or even commercial in some cases. This eco system is continuously evolving with large number of open source contributors. As shown in the above diagram. The following diagram gives high level overview of hadoop ecosystem. Figure 1: Hadoop Ecosystem The Hadoop ecosystem is logically divided into five layers which are self-explanatory. Some of the ecosystem components are explained below: Data Storage is where the raw data will be residing at. There are mul