Skip to main content

Mastering Hadoop: Book Review

I came across a book Mastering Hadoop published by Packt and authored by Sandeep Karath. Here is my detail review about the book-

This book is based on most popular massive parallel programming (MPP) framework "Hadoop" and its eco-system. This is an intermediate level book where author goes in depth on not only the principle subject but also on most of the supporting eco-systems like hive, pig, stream, etc. The book has 374 pages with 12 chapters, the ToC  itself is spanned across 7 pages! It has conceptual as well as hands on lab experiences with lot of code churned into.

The book starts with genealogy of Hadoop where the author has nicely narrated the evolution of web search to current state and then various releases of Hadoop. Good reasoning as why Hadoop 2.0 was essential to move ahead from previous version. Touches the architecture starting from high level 3-layered, drilling down step by step to cluster and node level. Describes all the features of Hadoop2.x nicely and then talks about 4 major hadoop distros.

The concepts of MapReduce (MR) algorithm like merge and spills of intermediate outputs, stagglers, job counter, data joins packed in chapter 2. On the labs front, it has explained MapReduce example to great detail. explains custom RecordReader implementation. Some tips are really handy like heuristic formula for calculating optimum number of reducers. It is to be noted that the chapter assumes the reader has basic knowledge of this algo and it talks about the advance concepts.

Chapter 3, Pig talks about in-dept execution process of pig latin script and semantics along with many tips to optimize the query performance. It also shows practicle ways to use Pig for joining, as combiner, as abstract data analyzer for Data Acyclic Graph (DAG).

Hive in ch 4, another way to scoop the data from Hadoop in a conventional SQL-like style from RDBMS world. This is also covered in pretty details starting with its architecture to HiveQL semantics, execution steps and optimization tips like indexing, partioning, etc. It also captures exntensibles like UDF, UDAF and UDTF.

Hadoop Serialization and I/O talks about techniques of SerDe. After talking about Hadoop's own implementation and JDK's implementation, it slowly starts Apache's Avro tool with clearly stating its advantages and detail example. It explains the steps of Avro/Pig and Avro/Hive integration.

Chapter 6 & 7 talks about Yarn and Storm. YARN introduces the new architecture along with example of writing client and scheduling job plus ways to monitor it. Storm talks about low latency processing (aka real time processing), compares between Hadoop MR and Apache Storm with with the help of process diagrams and also explains the concepts of spout, bolt and topology with the help of java based example. It ends with installation on hadoop.

Then it flows with Hadoop off premise offerings (Cloud based!) like Anazon's AWS based EMR and Microsoft's Azure based HDInsight with enough comparison points as well as enough configuration steps.

Hadoop replacements gets into debating pros and cons of HDFS and possible extensions like AWS S3 which can make it more powerful. Actually adding more points on alternative systems likes of Cassandra, Ceph, GlusterFS would gave been value addition here.

Then it delves into features like HDFS Federation, Hadoop Security with its four pillars Authentication, Authorization, Auditing and Data Protection with each explained in great detail.

And here comes ch 12, Analytics using Hadoop: Its Machine Learning is a very interesting topic to have in this book but not sure if to that extent of detail. You need to have some statistical knowledge to understand some tpoics from the chapter as it talks about the terms/algos like tf-idf, k-means clustering. At the end, it talks about data analysis libraries: RHadoop, Mahout. Overall this chapter provides good handles on analytics.

The book ends with the appendix of "Hadoop for MS Windows". Thanks to Hortonworks! Now you can get Hadoop distro on win platform as well as their PaaS offering on MS Azure, more details follow in this chapter.

The author definitely seems having a rich experience in the field and is successful in conveying the depth of the subject through this book. Also the source code for the book is available at github.

In otherwise crowded Hadoop beginners' books, this one is different and catering an intermediate level. I wish all the very best to this efforts...

Anyone who has prior knowledge of Hadoop1.x can easily upgrade himself to Hadoop 2.x YARN. But then even the one with little knowledge of database and java can read this book to explore this new eco-system to enhance existing skills.


  1. The expansion of internet and intelligence in business process lead the way to huge volume of data. It is important to maintain and process these data to be efficient in data handling. Hadoop Training in Chennai | Big Data Course in Chennai

    1. Thank You for sharing your article, This is an interesting & informative blog. It is very useful for the developer like me. Kindly keep blogging. Besides that Wisen has established as Best Corporate Training Companies in Chennai .

      Nowadays JavaScript has tons of job opportunities on various vertical industry. Know more about JavaScript Framework Training visit Corporate Training Companies in India.

      This post gives me detailed information about the technology. corporate training in chennai

  2. Excellent post, now a day’s huge demand for the certified java professionals in IT industry. Java gives more career opportunity for the fresher’s as well as experienced experts.
    JAVA Training in Chennai|JAVA Course in Chennai

  3. Hi, Really your post was very informative. Today's internet era learn Hadoop Online Training will helps you to reach your goal.Selenium Training

  4. Thanks for your informative blog!!! Todays more demand on certified Developers and Adminstrators on Hadoop in companies.Keep on updating your with such awesome information about Hadoop.
    Big Data Hadoop Training In Hyderabad

  5. This comment has been removed by a blog administrator.

  6. This comment has been removed by a blog administrator.

  7. Really useful information about hadoop, i have to know information about hadoop online training institutes.

  8. Webtrackker Indirapuram offers an inclusive software testing training in Indirapuram. The extensive practical training provided by the Software Testing training institute in Indirapuram, equips live projects and simulations. Such a detailed course in Software Testing has helped our students to obtain work in several multinationals. The Webtrackker trainers are subject to specialized corporate professionals who offer an in-depth study in the Software Testing course in Indirapuram.
    software testing institute in Indirapuram


  9. Webtrackker is the best Salesforce online training in india, Do not assume that all sales employees have understood how the training should be applied. Sales training is largely generic. There may be a gap between knowing how to apply a principle. You want to make sure you close that gap. If necessary, take a new language. If the training requires a new language or terms that you have not used before, adjust the new terms as part of your sales vocabulary. This will help strengthen the training. Webtrackker is the best training in India Do not conduct sales training that is not in line with your sales philosophy. Before investing in a sales training program, make sure the curriculum matches your sales philosophy. For example, if you use a strategic sales process, do not send your salespeople to training that focuses primarily on tactics and not strategies. Keep the goals of the sales team members that you want to achieve with the salesforce training before the salesforce training begins. Knowing what you want to stop training before you start training is very valuable. Aws online training in india
    Salesforce online training in india

  10. This comment has been removed by a blog administrator.

  11. This comment has been removed by a blog administrator.

  12. This comment has been removed by a blog administrator.

  13. Sirkus System Bangalore Reviews- Sirkus System IT Services Pvt Ltd a logo name specialized in product improvement & answers for mobile environment and other platforms Sirkus device Bangalore critiques- Quality development, dedicated work approach and professional attitude are some of the traits which outline Sirkus Systems IT Services Pvt Ltd.

    Sirkus system
    sirkus system
    Sirkus Systems
    sirkus system review
    Sirkus System
    Sirkus System Reviews
    Sirkus System
    Sirkus System Review

  14. This comment has been removed by the author.

  15. Java training in indirapuram- There are multiple structures and streams for developing a product or utility. When we talk of technology and programming languages, Java is the maximum desired platform. It is used to expand a whole lot of programs for the systems and embedded devices like cellular telephones, drugs, laptops, and many others.

    Java training in indirapuram

    Hadoop training in indirapuram

    sas training in indirapuram

    sap training in indirapuram

    linux training in indirapuram

    sap fico training in indirapuram

    web design training in indirapuram

    php training in indirapuram

  16. Great post and informative blog on hadoop. It was awesome to read, thanks for sharing this great content to my vision.
    BE/B.Tech Project Center in Chennai | ME/M.Tech Project Center in Chennai | Final Year Project Center in Chennai

  17. Your new valuable key points imply much a person like me and extremely more to my office workers. With thanks; from every one of us.

    white label website builder

    mobile website builder

  18. It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me

    digital marketing training in chennai

  19. Thanks a lot very much for the high quality and results-oriented help. I won’t think twice to endorse your blog post to anybody who wants and needs support about this area.

  20. CIITN is the Best Php training institute in Noida and delhi Ncr. You will get Live Project Training on PHP by our PHP expert who have 5+ year industrial experience.Focus on practical and live project training. In our PHP training, we you will learn core PHP, advance PHP, HTML, CSS, JavaScript, jQuery, Bootstrap, Cake PHP and Wordpress.CIITN provides 100% job assistance in PHP training. CIITN is well known PHP coaching center because our 100% PHP students are placed now.

    Ciitnoida provides Core and java training institute in noida. We have a team of experienced Java professionals who help our students learn Java with the help of Live Base Projects. The object-oriented, class-based build of Java has made it one of most popular programming languages and the demand of professionals with certification in Advance Java training is at an all-time high not just in India but foreign countries too.

    By helping our students understand the fundamentals and Advance concepts of Java, we prepare them for a successful programming career. With over 13 years of sound experience, we have successfully trained hundreds of students in Noida and have been able to turn ourselves into an institute for best Java training in Noida.

    java training institute in noida
    php training in noida
    linux training in noida
    linux institute in noida
    java course in noida

  21. Needed to compose you a very little word to thank you yet again regarding the nice suggestions you’ve contributed here.

    blue prism training in chennai

  22. It is stunning and awesome to visit your site.Thanks for sharing this information,this is helpful to me planet-php

  23. Techonolgy is updated day to day
    Thanks for sharing the info
    ">Salesforce Training

  24. Nice post keep do posting , Hadoop is best platform for the data securty and how the data will flows form one network to another network, There are different modules like HIVE PIG MYSQL and looking for the
    Best Amazon web Services Training Hyderabad
    Learn Online DevOps Training

  25. Thanks for share this information. I have read your blog. Your information
    is really helpful for me. Keep update your blog.
    Guest posting sites
    Technical updates

  26. Extremely Informative post a debt of gratitude is in order for the sharing.
    Education | Article Submission sites | Technology

  27. Wonderful blog & good post.Its really helpful for me, awaiting for more new post. Keep Blogging !!
    Blue Prism Training in Chennai | Blue Prism Training Institute in Chennai

  28. Brilliant article. The information I have been searching precisely. It helped me a lot, thanks. Keep coming with more such informative article. Would love to follow them.
    sap abap training online

  29. very nice one and the informations are so valuable. Best devops training in chennai

  30. It has been simply incredibly generous with you to provide openly what exactly many individuals would’ve marketed for an eBook to end up making some cash for their end, primarily given that you could have tried it in the event you wanted.

  31. Nice post keep do posting The Info was too good, for more information regarding the technology Click
    Amazon web Services Training
    Professional Salesforce CRM Training

  32. Great blog! Really awesome I got more information from this blog. Thanks for sharing with us.

    salesforce developer training in chennai

    salesforce administrator training in chennai

  33. This comment has been removed by the author.

  34. Webtrackker Technology
    C-67,Noida sec-63
    Oracle Training institute in Noida

  35. wow is what comes to my mind... its amazing that a simple plastic wrap can be turned into something mystical
    Big Data Training in Chennai |
    Big Data Training |
    Big Data Course in Chennai

  36. The desire to play is normal, if you are passionate passion, then use it for the benefit of yourself. best online casino roulette I would play as if I live on the last day.

  37. Thanks for sharing this valuable information. Check on the below link if you are looking for best Hadoop training in chennai.

    Hadoop Training In Chennai

  38. I have gone through your blog, it was very much useful for me and because of your blog, and also I gained many unknown information, the way you have clearly explained is really fantastic. Kindly post more like this, Thank You.
    Aviation Academy in Chennai
    Air hostess training in Chennai
    Airport management courses in Chennai
    Ground staff training in Chennai
    best aviation academy in chennai
    cabin crew course in chennai
    diploma in airport management course in chennai
    airport ground staff training in chennai

  39. Thanks for sharing this valuable information. Check on the below link if you are looking for best Hadoop training in chennai.

    Hadoop Training In Chennai

  40. Best course in IT course is hadoop and also msbi is best of these
    best msbi training in chennai

  41. You are doing a great job. I would like to appreciate your work for good accuracy.

    Machine Learning Course in Chennai | Machine Learning Training in Chennai

  42. Great blog thanks for posting keep on posting
    java training in chennai

  43. The way of you expressing your ideas is really gave more useful ideas for us and please update more ideas for the learners.
    Python Training in Chennai
    Digital Marketing Course in Chennai
    Hadoop training in chennai
    Big data training in chennai
    big data training in velachery

  44. I have been following your post past long time. I always found it very interesting and valuable. keep posting it is really helpful.

    cloud computing course in delhi

    cloud computing course in Noida

    cloud computing course in Gurgaon

  45. You have shared amazing post. This post is really helpful for us to know the information of java. Thank you for taking your time to post such a wonderful article. Php coaching in jaipur

  46. I went through your blog its really interesting and holds an informative content. Thanks for uploading such a wonderful blog.
    python classes near Bellandur|python classes in Marathahalli
    selenium testing classes in Bangalore|selenium testing classes near Bellandur

  47. Hey Nice Blog!! Thanks For Sharing!!!Wonderful blog & good post.Its really helpful for me, waiting for a more new post. Keep Blogging!
    salesforce Training in Bangalore
    uipath Training in Bangalore
    blueprism Training in Bangalore

  48. For Hadoop Training in Bangalore Visit : Hadoop Training in Bangalore

  49. Nice article, which you have described very well about Mastering Hadoop . Your article is very useful for those who are looking to buy a python traning. thanks for sharing.
    Python Training Institutes in India

  50. The development of artificial intelligence (AI) has propelled more programming architects, information scientists, and different experts to investigate the plausibility of a vocation in machine learning. Notwithstanding, a few newcomers will in general spotlight a lot on hypothesis and insufficient on commonsense application. Machine Learning Final Year Projects In case you will succeed, you have to begin building machine learning projects in the near future.

    Projects assist you with improving your applied ML skills rapidly while allowing you to investigate an intriguing point. Furthermore, you can include projects into your portfolio, making it simpler to get a vocation, discover cool profession openings, and Final Year Project Centers in Chennai even arrange a more significant compensation.

    Data analytics is the study of dissecting crude data so as to make decisions about that data. Data analytics advances and procedures are generally utilized in business ventures to empower associations to settle on progressively Python Training in Chennai educated business choices. In the present worldwide commercial center, it isn't sufficient to assemble data and do the math; you should realize how to apply that data to genuine situations such that will affect conduct. In the program you will initially gain proficiency with the specialized skills, including R and Python dialects most usually utilized in data analytics programming and usage; Python Training in Chennai at that point center around the commonsense application, in view of genuine business issues in a scope of industry segments, for example, wellbeing, promoting and account.

  51. Really i found this article more informative, thanks for sharing this article! Also Check here

    Download and install Vidmate App which is the best HD video downloader software available for Android. Get free latest HD movies, songs, and your favorite TV shows

    Vidmate App Download

    Vidmate apk for Android devices

    Vidmate App

    download Vidmate for Windows PC

    download Vidmate for Windows PC Free

    Vidmate Download for Windows 10

    Download Vidmate for iOS

    Download Vidmate for Blackberry

    Vidmate For IOS and Blackberry OS

  52. This comment has been removed by the author.

  53. Nice post I have been searching for a useful post like this on salesforce course details, it is highly helpful for me and I have a great expereince with this
    Salesforce Training  which is a best institute for career building program.

  54. This comment has been removed by the author.

  55. Nice post I have been searching for a useful post like this on salesforce course details, it is highly helpful for me and I have a great experience with this
    Salesforce Training Chennai

  56. Nice post I have been searching for a useful post like this on salesforce course details, it is highly helpful for me and I have a great experience with this
    Salesforce Training Chennai

  57. Nice post...Thanks for sharing...
    Salesforce CRM Training in Marathahalli - Bangalore | Salesforce CRM Training Institutes | Salesforce CRM Course Fees and Content | Salesforce CRM Interview Questions - eCare Technologies located in Marathahalli - Bangalore, is one of the best Salesforce
    CRM Training institute with 100% Placement support. Salesforce CRM Training in Bangalore provided by Salesforce CRM Certified Experts and real-time Working Professionals with handful years of experience in real time Salesforce CRM Projects.


  58. Thanks for your post! Through your pen I found the problem up interesting! I believe there are many other people who are interested in them just like me! Thanks your shared!... I hope you will continue to have similar posts to share with everyone! I believe a lot of people will be surprised to read this article! Best SAP Hybris online training in hyderabad

  59. Well you have shared the best and informative information about education. as we provide learn Arabic online at affordable prices. for more info visit our website.

  60. Your Website is very good, Your Website impressed us a lot, We have liked your website very much.
    We have also created a website of Android App that you can see it.

  61. Your Website is very good, Your Website impressed us a lot, We have liked your website very much.
    We have also created a website of Android App that you can see it.

  62. Thanks for Sharing This Article.It is very so much valuable content. I hope these Commenting lists will help to my website
    servicenow online training
    best servicenow online training
    top servicenow online training

  63. Your Website is very good, Your Website impressed us a lot, We have liked your website very much.
    We have also created a website of Android App that you can see it.

  64. Your Website is very good, Your Website impressed us a lot, We have liked your website very much.
    We have also created a website of Android App that you can see it.

  65. I read this article, it is really informative one. Your way of writing and making things clear is very impressive. Thanking you for such an informative article. big data certification course

  66. Thanks for sharing article. The way you have stated everything above is quite awesome. Keep blogging like this. Thanks.
    AWS training in chennai | AWS training in annanagar | AWS training in omr | AWS training in porur | AWS training in tambaram | AWS training in velachery

  67. Hi there, I found your blog via Google while searching for such kinda informative post and your post looks very interesting for me. Aegean College

  68. Great job for publishing such a beneficial web site. Your web log isn’t only useful but it is additionally really creative too.
    Keep sharing more blogs like this.
    IELTS Coaching in chennai

    German Classes in Chennai

    GRE Coaching Classes in Chennai

    TOEFL Coaching in Chennai

    spoken english classes in chennai | Communication training

  69. Thanks for sharing information awesome blog-post. Online Education Quiz website For Exam Follow this website Gk in Hindi

  70. I am glad to read this post, it's a good one. I am always looking for quality posts and articles and this is what I found here, I hope you will be adding more in the future. Thanks for sharing.

    Online training for big data
    Big Data Hadoop Online Training


Post a Comment

Popular posts from this blog

Hadoop Ecosystem

When it comes to Hadoop, still some people believe it as a single out of box system catering all big data problems. Unless you are thinking of some third party commercial distribution, this is not correct. In reality, Hadoop on its own is just HDFS and MapReduce. But if you want production ready Hadoop system, then you will have to also consider Hadoop friends (or components) which makes it a complete big data solution. 

Most of the components are coming as apache projects but few of them are non-apache open source or even commercial in some cases. This eco system is continuously evolving with large number of open source contributors. As shown in the above diagram. The following diagram gives high level overview of hadoop ecosystem.

Figure 1: Hadoop Ecosystem

The Hadoop ecosystem is logically divided into five layers which are self-explanatory. Some of the ecosystem components are explained below:

Data Storage is where the raw data will be residing at. There are multiple file systems sup…

Is blockchain a technology or an algorithm?

After a phenomenal growth of bitcoin in 2017, all of sudden everyone in the cyber world has started talking about crypto-currency and the technology behind it - Blockchain. I am sure you too must be flowing through this new fanfare. So here I would be trying to explain this platform in just 11 mins.

What is Blockchain? Blockchain is an ever-growing list of transactions, called Blocks which are always linked to their previous Blocks and are secured by cryptography hash. This Blockchain will be stored on distributed peer-to-peer (P2P) network nodes. If you are familiar with BitTorrent, you can easily understand this P2P communication. Each block has three things: data, its own hash and hash to previous block
Data can be regarded as a ledger.Hash can be compared with fingerprint, which is a unique identification of the block. It is generated based on the content and even a single character change will make it different.Hash to previous Block: this creates a link which can be traversed bac…