Skip to main content

Hadoop Ecosystem

When it comes to Hadoop, still some people believe it as a single out of box system catering all big data problems. Unless you are thinking of some third party commercial distribution, this is not correct. In reality, Hadoop on its own is just HDFS and MapReduce. But if you want production ready Hadoop system, then you will have to also consider Hadoop friends (or components) which makes it a complete big data solution. 

Most of the components are coming as apache projects but few of them are non-apache open source or even commercial in some cases. This eco system is continuously evolving with large number of open source contributors. As shown in the above diagram. The following diagram gives high level overview of hadoop ecosystem.


Figure 1: Hadoop Ecosystem


The Hadoop ecosystem is logically divided into five layers which are self-explanatory. Some of the ecosystem components are explained below:


Data Storage is where the raw data will be residing at. There are multiple file systems supported by Hadoop and also there are connectors available for data warehouse (DW) and relational databases.
HDFS is distributed file system comes out of box with Hadoop framework. It uses TCP/IP layer for communication. An advantage of using HDFS is data awareness between the job tracker and task tracker.
Amazon S3 filesystem is targeted at clusters hosted on the Amazon Elastic Compute Cloud (EC2) server-on-demand infrastructure. There is no rack-awareness in this file system, as it is all remote.
MapR’s maprfs provides high availability, transactional correct snapshots and higher performance than HDFS. Maprfs is available as part of the MapR distribution.
HBase is column oriented, multidimensional spatial database inspired by Google’s BigTable. HBase provides sorted data access by maintaining partitions or regions of data. The underlying storage is HDFS.

Hive is a data warehouse infrastructure with SQL like querying capabilities on hadoop datasets. The SQL interface makes Hive an attractive choice for developers to quickly validate data, for product managers and for analysts.
Pig is a high level data flow platform and execution framework for parallel computation. It uses the scripting language Pig Latin. Pig scripts are automatically converted into MapReduce jobs by the Pig interpreter, so you can analyze the data in a Hadoop cluster even if you aren't familiar with java & MapReduce.
Avro is a data serialization system which provides rich data format, container file to store persistent data, remote procedure call. It uses JSON to define data types and protocols, and serializes data in compact binary format.
Mahout is a machine learning software having core algorithms as (use and item based) recommendation or batch based collaborative filtering, classification and clustering. The core algorithms are implemented on top of Apache Hadoop using map/reduce paradigm though it can also be used outside hadoop world as math library focused on linear algebra and statistics.
Sqoop is designed for efficiently transferring bulk data between apache hadoop and structured datastores such as relational databses. It is a command line interface application supporting incremental loads of a single table or a free form SQL query as well as saved jobs which can be run multiple times to import updates made to a database since the last import. Imports can also be used to populate tables into Hive or HBase.

Oozie is a workflow scheduler system to manage Apache Hadoop jobs. It is server based workflow engine, where workflow is a collection of actions like hadoop map/reduce, Pig/Hive/Sqoop jobs arranged in a control dependency DAG (Directed Acyclic Graph). Oozie is scalable, reliable and extensible system.
Amazon’s Elastic MapReduce (EMR) provisions Hadoop cluster, running and terminating jobs, and handling data transfer between EC2 and S3 are automated by Elastic MapReduce.
Chukwa is an open source data collection system for monitoring large distributed systems. Chukwa is built on top of HDFS and Map/Reduce framework and inherits Hadoop’s scalability and robustness. Chukwa also includes a flexible and powerful toolkit for displaying, monitoring and analyzing results to make the best use of the collected data.
Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic application.
ZooKeeper is another Apache Software Foundation’s project which provides open source distributed coordination service, synchronization service and naming registry for large distributed systems. ZooKeeper’s architecture supports high-availability through redundant services It uses hierarchical file system and is fault tolerant, high performing facilitates loose coupling.
ZooKeeper is already used by many Apache projects like HDFS, HBase as well as its running in production by Yahoo, FaceBook, Rackspace, etc.

Data Analytics is the area where lot of third party vendors are providing various proprietary as well as open source tools. Discussed few of them below:
Pentaho – has capability of data integration (kettle), analytics, reporting, visualization and predictive analytics directly from Hadoop nodes. It is available with enterprise support as well as community edition.
Storm – is a free and open source distributed, fault tolerant, real time computation system from unbounded streams of data.
Splunk – is an enterprise application, can perform real-time and historical search, as well as reporting and statistical analysis. It also provides cloud based flavor Splunk Storm.

While setting up the Hadoop ecosystem, you can either do setup on your own or can use third party distributions from the vendors like Amazon, MapR, Cloudera, Hortonworks, etc. Third party distributions might cost you little extra but takes away complexity of maintaining & supporting the system and you can focus on business problem.

Comments

  1. Nice summary. It would be great if you can go through a case study and demonstrate how this ecosystem works in real world applications.

    ReplyDelete
    Replies
    1. Thanks Deven your kind words and feedback. Your idea is fantastic and I will certainly work towards that.

      Delete
  2. Nice information About hadoop ecosystem Thanks for sharing it
    Hadoop Training in Chennai

    ReplyDelete
  3. Great information about Hadoop Ecosystem.It was useful for my hadoop studies.Keep in blogging.I am waiting for your next blog... Hadoop Training in Chennai
    Dot Net Training in Chennai

    ReplyDelete
  4. The blog gave me idea about the hadoop ecosystem and the components of hadoop ecosystem are explained in an understandable manner my sincere thanks for sharing this post
    Hadoop Training in Chennai

    ReplyDelete
  5. Good and nice information, thanks for sharing your views and ideas.. keep rocks and updating.

    Hadoop Training in chennai | Dot Net Training in chennai

    ReplyDelete
  6. Well Said, you have furnished the right information that will be useful to anyone at all time. Thanks for sharing your Ideas.
    Hadoop Online Training
    Data Science Online Training

    ReplyDelete
  7. I‘d mention that most of us visitors are endowed to exist in a fabulous place with very many wonderful individuals with very helpful things.
    hadoop training in bangalore
    hadoop training in chennai

    ReplyDelete
  8. Needed to compose you a very little word to thank you yet again regarding the nice suggestions you’ve contributed here.

    Data Science Training in Bangalore

    Datascience Training in Chennai

    ReplyDelete
  9. Great and decent data, a debt of gratitude is in order for sharing your perspectives and thoughts.. keep shakes and refreshing.
    Article Submission sites | Education | Technology | Latest Updates

    ReplyDelete
  10. I have to agree with everything in this post. Thanks for the useful information.
    DOT NET Training in Chennai
    DOT NET Course in Chennai

    ReplyDelete
  11. Your new valuable key points imply much a person like me and extremely more to my office workers. With thanks.
    Tibco Training From India

    ReplyDelete
  12. QuickBooks has made payroll management quite definitely easier for accounting QuickBooks Payroll Tech Support Number There are plenty people that are giving positive feedback once they process payroll

    ReplyDelete
  13. You named a blunder and we also have the clear answer, this can be essentially the most luring features of QuickBooks Enterprise Technical Support Number channel available on a call at .You can quickly avail our other beneficial technical support services easily once we are merely a single call definately not you.

    ReplyDelete
  14. The smart accounting software is richly featured with productive functionalities that save your time and accuracy associated with the work. Since it is accounting software, from time to time you may possibly have a query and can seek assistance. This is why why QuickBooks has opened toll free QuickBooks Help Number.

    ReplyDelete
  15. We provide Quickbooks Payroll tech support team with regards to customers who find QuickBooks Payroll difficult to use. As QuickBooks Payroll Contact Phone Number we utilize the responsibility of resolving all the problems that hinder the performance associated with the exuberant software. There clearly was sometimes a number of errors that could bother your projects flow, nothing should be taken as burden with that said because the support team of Quickbooks Payroll customer service resolves every issue in minimal some time commendable expertise.

    ReplyDelete
  16. Every user are certain to get 24/7 support services with this online technical experts using QuickBooks support contact number. When you’re stuck in times which you can’t discover ways to eradicate a concern, all that is necessary would be to dial QuickBooks Customer Service. Remain calm; they will inevitably and instantly solve your queries.

    ReplyDelete
  17. QuickBooks has almost changed this can be of accounting. Nowadays accounting has exploded in order to become everyone’s cup of tea and that’s only become possible because because of the birth of QuickBooks Tech Support Phone Number.

    ReplyDelete
  18. You are able to be assured; all of the errors and problems are handled because of the simplest running a business. Our specialists can get to work on your drawback at once. this is why we usually tend to square measure recognized for QuickBooks Support Phone Number client Support services. we have a tendency to rank our customers over something and therefore we try to give you a swish accounting and management expertise.

    ReplyDelete
  19. During those times, you do not worry after all and simply reach our QuickBooks Enterprise Support Phone Number channel readily available for a passing fancy call.

    ReplyDelete

  20. The primary functionality of QuickBooks Support Phone Number is dependent upon company file. Based on the experts, if you want solve the specific situation, then you'll definitely definitely definitely need to accept it first.

    ReplyDelete
  21. Though these features appear to be extremely useful as well as in fact these are typically so, yet there are numerous loopholes that will trigger a few errors. These errors might be resolvable at QuickBooks Support Phone Number by our supremely talented

    ReplyDelete
  22. Yes, our QuickBooks Enterprise Customer Service Number can be a magic pill to resolve any QuickBooks Enterprise tech issue. Our QuickBooks Enterprise Support team comprises of QuickBooks Experts who are able to solve your problems instantly as soon as they get a call on QuickBooks Enterprise number.

    ReplyDelete
  23. Though QuickBooks Payroll Support USA a helpful tool for QuickBooks users in the industry realm, yet some hits and misses may arise anytime during use. Every one of these issues become a good inconvenience to users causing a collection returning to the highly reputed nature of QuickBooks accounting software.

    ReplyDelete
  24. QuickBooks Customer Support Number make sure that the solutions we give you would be best ideal for your software, both for the current situation and for future. Regardless of leveraging you with less time-consuming answers, we never compromise with the quality of your services.

    ReplyDelete
  25. QuickBooks is present for users across the world since the best tool to offer creative and innovative features for business account management to small and medium-sized business organizations. If you’re encountering any type of QuickBooks’ related problem, you will get all of that problems solved just by using the QuickBooks Support Number.

    ReplyDelete
  26. You are able to rest assured about getting the most desirable and efficacious help on every issue which you might encounter yourself with. You merely have to avail the assistance from the technical experts by dialing the QuickBooks Support Number.

    ReplyDelete
  27. But, being a consistent business person, focusing on professional accounting software, like QuickBooks Tech Support Phone Number , is obviously not always easy. Thus, users may have to face a myriad of issues and error messages while using the software.

    ReplyDelete
  28. You will find so many fields it covers like creating invoices, managing taxes, managing payroll etc. However exceptions are typical over, sometimes it generates the negative aspects and user wants Support For QuickBooks Service help.

    ReplyDelete
  29. QuickBooks Support Phone Number

    QuickBooks Toll-Free offers an extensive financial solution, where it keeps your entire business accounting requirements in one single place. From estimates to bank transfers, invoicing to tracking your expenses and staying on top of bookkeeping with regards to tax time, it really is prepared for many from it at one go. A total package to create you clear of Financial accounting and back office worries any time to make sure you concentrate on your own expert area and yield potential development in business.

    ReplyDelete
  30. Our instantly QuickBooks Support Number is perfect in taking down every QuickBooks error. We can assure you this with a guarantee. Call our QuickBooks Support phone Number. Our QuickBooks Support team will attend you.

    ReplyDelete
  31. Hawk-eye on expenses: it is possible to set a parameter to a specific expense. This parameter can be learned, especially, from our best QuickBooks Tech Support Number experts.

    ReplyDelete
  32. Problems with Your QuickBooks Accounting Software are a Myth Now. Get Exclusive Support With Business Documentation on QuickBooks Along Side Essential Feature Updates From Our QuickBooks Tech Support Phone Number Team. We are here to Make Sure Your Accounting Software Runs Optimally.

    ReplyDelete
  33. QuickBooks Support Phone Number

    For Round The Clock Technical Assistance In QuickBooks United State (USA) and Canada Along with the ability to integrate easily with other programs, QuickBooks offers a number of time saving benefits to manage your complex business accounting process.

    ReplyDelete
  34. We have the best and the most convenient answer to enhance your productivity by solving every issue you face with the software. Give us a call at QuickBooks Customer Support Number to avail the greatest customer service services designed for you.

    ReplyDelete
  35. dell Printer Support Phone Number

    Dell Printer Support Phone Number + 1-888-600-5222 dell is multinational brand device, We are always available for your Support 365 days. Dell Printer Technical Support Number and Customer Services. Dell is a US-based multinational tech organization having its headquarters located in Round Rock, Texas.This tech organization supports, repairs, sells, and develops technology products.

    ReplyDelete
  36. Our talented team of professionals is invariably in a position to assist you whatever needs doing. QuickBooks Support Phone Number client Service can be obtained 24*7 Our Professionals have designed services in a competent means so they will offer the mandatory techniques to the shoppers.

    ReplyDelete
  37. Get prominent options for QuickBooks Support Number near you right away! Without any doubts, QuickBooks has revolutionized the process of doing accounting this is the core strength for small in addition to large-sized businesses. QuickBooks Support telephone number is assisted by our customer support specialists who answr fully your call instantly and resolve all of your issues at that moment. It is a backing portal that authenticates the users of QuickBooks to perform its services in a user-friendly manner.

    ReplyDelete
  38. You might be always able to relate with us at our QuickBooks Customer Support Phone Number to extract the very best support services from our highly dedicated and supportive QuickBooks Support executives at any point of time as all of us is oftentimes prepared to work with you. Most of us is responsible and makes sure to deliver hundred percent assistance by working 24*7 to suit your needs. Go ahead and mail us at our quickbooks support email id whenever you are in need. You could reach us via call at our toll-free number.

    ReplyDelete
  39. QuickBooks Payroll Support Number software is the most useful accounting service that can execute all payroll operations to run the business smoothly. Payroll could be the directory of the organization’s employee and the amount of cash (salary, incentives, bonuses) is given, they should be paid. It is primarily important for employers, employees, and accountants for business purposes.

    ReplyDelete

  40. QuickBooks Enterprise Support Phone Number often called the QB is the better accounting software which has integrated various tools which will make your online business accounting process a hurdle free one. QuickBooks is popular due to the reliable, certain and accurate calculations which do save your time in terms of managing your business accounts the correct way.

    ReplyDelete

  41. QuickBooks Payroll Tech Support Number is an end to end business, advanced competitive accounting software. But since it is reasonably limited software with many advanced functions, taking support when it comes to software is a better solution to run this impressive software without any technical issue.

    ReplyDelete

  42. Looking financial part of a small business is the most important and inconvenient task. Whenever you can take care of QuickBooks Payroll Support Number running business and would like to bring your business to a new height then maintaining the payroll budgetary information is necessary.

    ReplyDelete
  43. QuickBooks Tech Support Phone Number professionals are terribly dedicated and might solve your entire issues without the fuss. In the event that you call, you might be greeted by our client service representative when taking all of your concern he/she will transfer your preference in to the involved department.

    ReplyDelete
  44. Rectifying errors desires in-depth information regarding the device as well as its intricacies. Our internet site can be a go-to supply for everything associated with QuickBooks Tech Support Number.

    ReplyDelete
  45. Our research team at QuickBooks Tech Support Number is dependable for most other reasons as well. We have customer care executives which are exceptionally supportive and pay complete awareness of the demand of technical assistance made by QuickBooks users.

    ReplyDelete
  46. Additionally, it might happen to occur amid Windows startup or shutdown, or notwithstanding when the Windows working framework is being introduced. This is the reason it is essential to monitor when and where the 9999 blunder happens which goes about as an incredibly vital snippet of data in investigating the issue. If you would like to learn How To Troubleshoot Quickbooks Error 9999, you can continue reading this blog.

    ReplyDelete
  47. Really nice and interesting post. I was looking for this kind of information and enjoyed reading this one. Keep posting. Thanks for sharing.
    machine learning course training in guduvanchery

    ReplyDelete
  48. Algorithms and Data structures: This side deals with functionality such as searching data storage structures and the formation (of models) of data storage (linked-lists, arrays, trees etc). data science course in india

    ReplyDelete
  49. Login Your exness login Account To Read The Latest News About The Platform.s

    ReplyDelete
  50. If You Are Looking For A Reliable Fx Broker, Don't Rush And Read This XM REVIEW Review First. This Is A Serious Warning Against The Broker's Illegal Activities.

    ReplyDelete

Post a Comment

Popular posts from this blog

Mastering Hadoop: Book Review

I came across a book Mastering Hadoop published by Packt and authored by Sandeep Karath. Here is my detail review about the book- SUMMARY This book is based on most popular massive parallel programming (MPP) framework " Hadoop " and its eco-system. This is an intermediate level book where author goes in depth on not only the principle subject but also on most of the supporting eco-systems like hive, pig, stream, etc. The book has 374 pages with 12 chapters, the ToC  itself is spanned across 7 pages! It has conceptual as well as hands on lab experiences with lot of code churned into. OPINION The book starts with genealogy of Hadoop where the author has nicely narrated the evolution of web search to current state and then various releases of Hadoop. Good reasoning as why Hadoop 2.0 was essential to move ahead from previous version. Touches the architecture starting from high level 3-layered, drilling down step by step to cluster and node level. Describes all the feat

Is blockchain a technology or an algorithm?

After a phenomenal growth of bitcoin in 2017, all of sudden everyone in the cyber world has started talking about crypto-currency and the technology behind it - Blockchain . I am sure you too must be flowing through this new fanfare. So here I would be trying to explain this platform in just 11 mins . What is Blockchain? Blockchain is an ever-growing list of transactions , called Blocks which are always linked to their previous Blocks and are secured by cryptography hash. This Blockchain will be stored on distributed peer-to-peer (P2P) network nodes. If you are familiar with BitTorrent, you can easily understand this P2P communication. Each block has three things: data, its own hash and hash to previous block Data can be regarded as a ledger. Hash can be compared with fingerprint, which is a unique identification of the block. It is generated based on the content and even a single character change will make it different. Hash to previous Block: this creates a link