Skip to main content

Posts

Showing posts from June, 2012

Use MySQL Information Schema carefully!

While working on a procedure which was running on dynamically created data table to derive one complex number, I observed one strange problem where MySQL was continuously eating memory and in spite of 35GB RAM, it was running out of memory after few hours.


After debugging the algorithm, it is found that we were querying on information_schema.columns to get the column name for each calculation and after 10,000 queries, it MySQL would start crawling and we had to restart it to release the memory. The issue is fixed by adding one static table containing column names which would be populated each time the data table got created dynamically.


But then it is interesting to observe this behavior of system table which uses the cache but does not close it fast! In fact one can optimize the usage of information schema referring MySQL official manual.

Big Data analysis on Cloud

Its very exciting to share a successful implementation of Map Reduce framework on Amazon's AWS infrastructure!
This was for a fortune 500 beverage company where input data comes from couple of different market research companies like Nielsen. The first step was to get rid of Nielsen's proprietary client Nitro and get more control on the monthly data analysis by storing it in MySQL database. While doing so, already we brought down the data analysis period from 5 weeks to 2 weeks.
Now my team has implemented a MAP REDUCE architecture of distributing data processing by adding parallel worker nodes for ETL (happening on windows) and data analysis (on Linux) and which in turn would be aggregated on a master node which releases data to dashboard. It is represented pictorially below:
To publish the data, one leading online charting system, iCharts is used which supports various input data format giving great flexibility to the users.
The amazing fact is that 12 days total process has com…