Skip to main content

Use MySQL Information Schema carefully!

While working on a procedure which was running on dynamically created data table to derive one complex number, I observed one strange problem where MySQL was continuously eating memory and in spite of 35GB RAM, it was running out of memory after few hours.


After debugging the algorithm, it is found that we were querying on information_schema.columns to get the column name for each calculation and after 10,000 queries, it MySQL would start crawling and we had to restart it to release the memory. The issue is fixed by adding one static table containing column names which would be populated each time the data table got created dynamically.


But then it is interesting to observe this behavior of system table which uses the cache but does not close it fast! In fact one can optimize the usage of information schema referring MySQL official manual.

Comments

Post a Comment

Popular posts from this blog

Mastering Hadoop: Book Review

I came across a book Mastering Hadoop published by Packt and authored by Sandeep Karath. Here is my detail review about the book-

SUMMARY
This book is based on most popular massive parallel programming (MPP) framework "Hadoop" and its eco-system. This is an intermediate level book where author goes in depth on not only the principle subject but also on most of the supporting eco-systems like hive, pig, stream, etc. The book has 374 pages with 12 chapters, the ToC  itself is spanned across 7 pages! It has conceptual as well as hands on lab experiences with lot of code churned into.


OPINION
The book starts with genealogy of Hadoop where the author has nicely narrated the evolution of web search to current state and then various releases of Hadoop. Good reasoning as why Hadoop 2.0 was essential to move ahead from previous version. Touches the architecture starting from high level 3-layered, drilling down step by step to cluster and node level. Describes all the features of Hadoo…

Hadoop Ecosystem

When it comes to Hadoop, still some people believe it as a single out of box system catering all big data problems. Unless you are thinking of some third party commercial distribution, this is not correct. In reality, Hadoop on its own is just HDFS and MapReduce. But if you want production ready Hadoop system, then you will have to also consider Hadoop friends (or components) which makes it a complete big data solution. 

Most of the components are coming as apache projects but few of them are non-apache open source or even commercial in some cases. This eco system is continuously evolving with large number of open source contributors. As shown in the above diagram. The following diagram gives high level overview of hadoop ecosystem.

Figure 1: Hadoop Ecosystem

The Hadoop ecosystem is logically divided into five layers which are self-explanatory. Some of the ecosystem components are explained below:

Data Storage is where the raw data will be residing at. There are multiple file systems sup…

Is blockchain a technology or an algorithm?

After a phenomenal growth of bitcoin in 2017, all of sudden everyone in the cyber world has started talking about crypto-currency and the technology behind it - Blockchain. I am sure you too must be flowing through this new fanfare. So here I would be trying to explain this platform in just 11 mins.

What is Blockchain? Blockchain is an ever-growing list of transactions, called Blocks which are always linked to their previous Blocks and are secured by cryptography hash. This Blockchain will be stored on distributed peer-to-peer (P2P) network nodes. If you are familiar with BitTorrent, you can easily understand this P2P communication. Each block has three things: data, its own hash and hash to previous block
Data can be regarded as a ledger.Hash can be compared with fingerprint, which is a unique identification of the block. It is generated based on the content and even a single character change will make it different.Hash to previous Block: this creates a link which can be traversed bac…