Skip to main content

Mastering Hadoop: Book Review

I came across a book Mastering Hadoop published by Packt and authored by Sandeep Karath. Here is my detail review about the book-

This book is based on most popular massive parallel programming (MPP) framework "Hadoop" and its eco-system. This is an intermediate level book where author goes in depth on not only the principle subject but also on most of the supporting eco-systems like hive, pig, stream, etc. The book has 374 pages with 12 chapters, the ToC  itself is spanned across 7 pages! It has conceptual as well as hands on lab experiences with lot of code churned into.

The book starts with genealogy of Hadoop where the author has nicely narrated the evolution of web search to current state and then various releases of Hadoop. Good reasoning as why Hadoop 2.0 was essential to move ahead from previous version. Touches the architecture starting from high level 3-layered, drilling down step by step to cluster and node level. Describes all the features of Hadoop2.x nicely and then talks about 4 major hadoop distros.

The concepts of MapReduce (MR) algorithm like merge and spills of intermediate outputs, stagglers, job counter, data joins packed in chapter 2. On the labs front, it has explained MapReduce example to great detail. explains custom RecordReader implementation. Some tips are really handy like heuristic formula for calculating optimum number of reducers. It is to be noted that the chapter assumes the reader has basic knowledge of this algo and it talks about the advance concepts.

Chapter 3, Pig talks about in-dept execution process of pig latin script and semantics along with many tips to optimize the query performance. It also shows practicle ways to use Pig for joining, as combiner, as abstract data analyzer for Data Acyclic Graph (DAG).

Hive in ch 4, another way to scoop the data from Hadoop in a conventional SQL-like style from RDBMS world. This is also covered in pretty details starting with its architecture to HiveQL semantics, execution steps and optimization tips like indexing, partioning, etc. It also captures exntensibles like UDF, UDAF and UDTF.

Hadoop Serialization and I/O talks about techniques of SerDe. After talking about Hadoop's own implementation and JDK's implementation, it slowly starts Apache's Avro tool with clearly stating its advantages and detail example. It explains the steps of Avro/Pig and Avro/Hive integration.

Chapter 6 & 7 talks about Yarn and Storm. YARN introduces the new architecture along with example of writing client and scheduling job plus ways to monitor it. Storm talks about low latency processing (aka real time processing), compares between Hadoop MR and Apache Storm with with the help of process diagrams and also explains the concepts of spout, bolt and topology with the help of java based example. It ends with installation on hadoop.

Then it flows with Hadoop off premise offerings (Cloud based!) like Anazon's AWS based EMR and Microsoft's Azure based HDInsight with enough comparison points as well as enough configuration steps.

Hadoop replacements gets into debating pros and cons of HDFS and possible extensions like AWS S3 which can make it more powerful. Actually adding more points on alternative systems likes of Cassandra, Ceph, GlusterFS would gave been value addition here.

Then it delves into features like HDFS Federation, Hadoop Security with its four pillars Authentication, Authorization, Auditing and Data Protection with each explained in great detail.

And here comes ch 12, Analytics using Hadoop: Its Machine Learning is a very interesting topic to have in this book but not sure if to that extent of detail. You need to have some statistical knowledge to understand some tpoics from the chapter as it talks about the terms/algos like tf-idf, k-means clustering. At the end, it talks about data analysis libraries: RHadoop, Mahout. Overall this chapter provides good handles on analytics.

The book ends with the appendix of "Hadoop for MS Windows". Thanks to Hortonworks! Now you can get Hadoop distro on win platform as well as their PaaS offering on MS Azure, more details follow in this chapter.

The author definitely seems having a rich experience in the field and is successful in conveying the depth of the subject through this book. Also the source code for the book is available at github.

In otherwise crowded Hadoop beginners' books, this one is different and catering an intermediate level. I wish all the very best to this efforts...

Anyone who has prior knowledge of Hadoop1.x can easily upgrade himself to Hadoop 2.x YARN. But then even the one with little knowledge of database and java can read this book to explore this new eco-system to enhance existing skills.


  1. The expansion of internet and intelligence in business process lead the way to huge volume of data. It is important to maintain and process these data to be efficient in data handling. Hadoop Training in Chennai | Big Data Course in Chennai

    1. Thank You for sharing your article, This is an interesting & informative blog. It is very useful for the developer like me. Kindly keep blogging. Besides that Wisen has established as Best Corporate Training Companies in Chennai .

      Nowadays JavaScript has tons of job opportunities on various vertical industry. Know more about JavaScript Framework Training visit Corporate Training Companies in India.

      This post gives me detailed information about the technology. corporate training in chennai

  2. Excellent post, now a day’s huge demand for the certified java professionals in IT industry. Java gives more career opportunity for the fresher’s as well as experienced experts.
    JAVA Training in Chennai|JAVA Course in Chennai

  3. Hi, Really your post was very informative. Today's internet era learn Hadoop Online Training will helps you to reach your goal.Selenium Training

  4. Thanks for your informative blog!!! Todays more demand on certified Developers and Adminstrators on Hadoop in companies.Keep on updating your with such awesome information about Hadoop.
    Big Data Hadoop Training In Hyderabad

  5. This comment has been removed by a blog administrator.

  6. This comment has been removed by a blog administrator.

  7. Really useful information about hadoop, i have to know information about hadoop online training institutes.

  8. Webtrackker Indirapuram offers an inclusive software testing training in Indirapuram. The extensive practical training provided by the Software Testing training institute in Indirapuram, equips live projects and simulations. Such a detailed course in Software Testing has helped our students to obtain work in several multinationals. The Webtrackker trainers are subject to specialized corporate professionals who offer an in-depth study in the Software Testing course in Indirapuram.
    software testing institute in Indirapuram


  9. Webtrackker is the best Salesforce online training in india, Do not assume that all sales employees have understood how the training should be applied. Sales training is largely generic. There may be a gap between knowing how to apply a principle. You want to make sure you close that gap. If necessary, take a new language. If the training requires a new language or terms that you have not used before, adjust the new terms as part of your sales vocabulary. This will help strengthen the training. Webtrackker is the best training in India Do not conduct sales training that is not in line with your sales philosophy. Before investing in a sales training program, make sure the curriculum matches your sales philosophy. For example, if you use a strategic sales process, do not send your salespeople to training that focuses primarily on tactics and not strategies. Keep the goals of the sales team members that you want to achieve with the salesforce training before the salesforce training begins. Knowing what you want to stop training before you start training is very valuable. Aws online training in india
    Salesforce online training in india

  10. This comment has been removed by a blog administrator.

  11. This comment has been removed by a blog administrator.

  12. This comment has been removed by a blog administrator.

  13. Sirkus System Bangalore Reviews- Sirkus System IT Services Pvt Ltd a logo name specialized in product improvement & answers for mobile environment and other platforms Sirkus device Bangalore critiques- Quality development, dedicated work approach and professional attitude are some of the traits which outline Sirkus Systems IT Services Pvt Ltd.

    Sirkus system
    sirkus system
    Sirkus Systems
    sirkus system review
    Sirkus System
    Sirkus System Reviews
    Sirkus System
    Sirkus System Review

  14. This comment has been removed by the author.

  15. Java training in indirapuram- There are multiple structures and streams for developing a product or utility. When we talk of technology and programming languages, Java is the maximum desired platform. It is used to expand a whole lot of programs for the systems and embedded devices like cellular telephones, drugs, laptops, and many others.

    Java training in indirapuram

    Hadoop training in indirapuram

    sas training in indirapuram

    sap training in indirapuram

    linux training in indirapuram

    sap fico training in indirapuram

    web design training in indirapuram

    php training in indirapuram

  16. Great post and informative blog on hadoop. It was awesome to read, thanks for sharing this great content to my vision.
    BE/B.Tech Project Center in Chennai | ME/M.Tech Project Center in Chennai | Final Year Project Center in Chennai

  17. Your new valuable key points imply much a person like me and extremely more to my office workers. With thanks; from every one of us.

    white label website builder

    mobile website builder

  18. It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me

    digital marketing training in chennai

  19. Thanks a lot very much for the high quality and results-oriented help. I won’t think twice to endorse your blog post to anybody who wants and needs support about this area.

  20. CIITN is the Best Php training institute in Noida and delhi Ncr. You will get Live Project Training on PHP by our PHP expert who have 5+ year industrial experience.Focus on practical and live project training. In our PHP training, we you will learn core PHP, advance PHP, HTML, CSS, JavaScript, jQuery, Bootstrap, Cake PHP and Wordpress.CIITN provides 100% job assistance in PHP training. CIITN is well known PHP coaching center because our 100% PHP students are placed now.

    Ciitnoida provides Core and java training institute in noida. We have a team of experienced Java professionals who help our students learn Java with the help of Live Base Projects. The object-oriented, class-based build of Java has made it one of most popular programming languages and the demand of professionals with certification in Advance Java training is at an all-time high not just in India but foreign countries too.

    By helping our students understand the fundamentals and Advance concepts of Java, we prepare them for a successful programming career. With over 13 years of sound experience, we have successfully trained hundreds of students in Noida and have been able to turn ourselves into an institute for best Java training in Noida.

    java training institute in noida
    php training in noida
    linux training in noida
    linux institute in noida
    java course in noida

  21. Needed to compose you a very little word to thank you yet again regarding the nice suggestions you’ve contributed here.

    blue prism training in chennai

  22. It is stunning and awesome to visit your site.Thanks for sharing this information,this is helpful to me planet-php

  23. Techonolgy is updated day to day
    Thanks for sharing the info
    ">Salesforce Training

  24. Nice post keep do posting , Hadoop is best platform for the data securty and how the data will flows form one network to another network, There are different modules like HIVE PIG MYSQL and looking for the
    Best Amazon web Services Training Hyderabad
    Learn Online DevOps Training

  25. Thanks for share this information. I have read your blog. Your information
    is really helpful for me. Keep update your blog.
    Guest posting sites
    Technical updates

  26. Extremely Informative post a debt of gratitude is in order for the sharing.
    Education | Article Submission sites | Technology

  27. Wonderful blog & good post.Its really helpful for me, awaiting for more new post. Keep Blogging !!
    Blue Prism Training in Chennai | Blue Prism Training Institute in Chennai

  28. Brilliant article. The information I have been searching precisely. It helped me a lot, thanks. Keep coming with more such informative article. Would love to follow them.
    sap abap training online

  29. very nice one and the informations are so valuable. Best devops training in chennai

  30. It has been simply incredibly generous with you to provide openly what exactly many individuals would’ve marketed for an eBook to end up making some cash for their end, primarily given that you could have tried it in the event you wanted.

  31. Nice post keep do posting The Info was too good, for more information regarding the technology Click
    Amazon web Services Training
    Professional Salesforce CRM Training

  32. Great blog! Really awesome I got more information from this blog. Thanks for sharing with us.

    salesforce developer training in chennai

    salesforce administrator training in chennai

  33. This comment has been removed by the author.

  34. Webtrackker Technology
    C-67,Noida sec-63
    Oracle Training institute in Noida

  35. wow is what comes to my mind... its amazing that a simple plastic wrap can be turned into something mystical
    Big Data Training in Chennai |
    Big Data Training |
    Big Data Course in Chennai

  36. The desire to play is normal, if you are passionate passion, then use it for the benefit of yourself. best online casino roulette I would play as if I live on the last day.

  37. Thanks for sharing this valuable information. Check on the below link if you are looking for best Hadoop training in chennai.

    Hadoop Training In Chennai

  38. I have gone through your blog, it was very much useful for me and because of your blog, and also I gained many unknown information, the way you have clearly explained is really fantastic. Kindly post more like this, Thank You.
    Aviation Academy in Chennai
    Air hostess training in Chennai
    Airport management courses in Chennai
    Ground staff training in Chennai
    best aviation academy in chennai
    cabin crew course in chennai
    diploma in airport management course in chennai
    airport ground staff training in chennai

  39. Thanks for sharing this valuable information. Check on the below link if you are looking for best Hadoop training in chennai.

    Hadoop Training In Chennai

  40. Best course in IT course is hadoop and also msbi is best of these
    best msbi training in chennai

  41. You are doing a great job. I would like to appreciate your work for good accuracy.

    Machine Learning Course in Chennai | Machine Learning Training in Chennai

  42. The way of you expressing your ideas is really gave more useful ideas for us and please update more ideas for the learners.
    Python Training in Chennai
    Digital Marketing Course in Chennai
    Hadoop training in chennai
    Big data training in chennai
    big data training in velachery

  43. I have been following your post past long time. I always found it very interesting and valuable. keep posting it is really helpful.

    cloud computing course in delhi

    cloud computing course in Noida

    cloud computing course in Gurgaon

  44. You have shared amazing post. This post is really helpful for us to know the information of java. Thank you for taking your time to post such a wonderful article. Php coaching in jaipur

  45. I went through your blog its really interesting and holds an informative content. Thanks for uploading such a wonderful blog.
    python classes near Bellandur|python classes in Marathahalli
    selenium testing classes in Bangalore|selenium testing classes near Bellandur


Post a Comment

Popular posts from this blog

Hadoop Ecosystem

When it comes to Hadoop, still some people believe it as a single out of box system catering all big data problems. Unless you are thinking of some third party commercial distribution, this is not correct. In reality, Hadoop on its own is just HDFS and MapReduce. But if you want production ready Hadoop system, then you will have to also consider Hadoop friends (or components) which makes it a complete big data solution. 

Most of the components are coming as apache projects but few of them are non-apache open source or even commercial in some cases. This eco system is continuously evolving with large number of open source contributors. As shown in the above diagram. The following diagram gives high level overview of hadoop ecosystem.

Figure 1: Hadoop Ecosystem

The Hadoop ecosystem is logically divided into five layers which are self-explanatory. Some of the ecosystem components are explained below:

Data Storage is where the raw data will be residing at. There are multiple file systems sup…

Is blockchain a technology or an algorithm?

After a phenomenal growth of bitcoin in 2017, all of sudden everyone in the cyber world has started talking about crypto-currency and the technology behind it - Blockchain. I am sure you too must be flowing through this new fanfare. So here I would be trying to explain this platform in just 11 mins.

What is Blockchain? Blockchain is an ever-growing list of transactions, called Blocks which are always linked to their previous Blocks and are secured by cryptography hash. This Blockchain will be stored on distributed peer-to-peer (P2P) network nodes. If you are familiar with BitTorrent, you can easily understand this P2P communication. Each block has three things: data, its own hash and hash to previous block
Data can be regarded as a ledger.Hash can be compared with fingerprint, which is a unique identification of the block. It is generated based on the content and even a single character change will make it different.Hash to previous Block: this creates a link which can be traversed bac…