Sunday, 17 March 2013

Whats Big Data is all About

 
With the increasing hype over name Big Data in recent past, still many people wonders what exactly this term means. As the name speaks for its own, "Big Data" is collection of very large data set, but we are handling large data very effectively since the evolution of databases management system in 1960's for query processing, data analysis and for many other purpose then why this term "Big Data" really evolved. This is due to series of change in requirements and moreover with the change in way we started using data analysis for business intelligence, decision making, in medical science, etc, and this made the data processing very difficult even impossible(we can say in terms of cost effectiveness) on the traditional RDBMS and Data Warehouses.

Before we move forward lets see few of the formal definition of Big Data given by the experts:

According to Wikipedia:- "Big data usually includes data sets with sizes beyond the ability of commonly-used software tools to capture, curate, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target, as of 2012 ranging from a few dozen terabytes to many petabytes of data in a single data set."

According to MIKE2.0, an open approach to Information Management "A good definition of big data is to describe “big” in terms of the number of useful permutations of sources making useful querying difficult and complex interrelationships making purging difficult."

According to IDC "Big Data technologies as a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data by enabling high-velocity capture, discovery, and/or analysis." There are three main characteristics of Big Data:

  • the data itself, 
  • the analytics of the data, 
  • and the presentation of the results of the analytics. 

Then there are the products and services that can be wrapped around one or all of these Big Data elements




 
With these definition still questions like why and for what reason term "Big Data" evolved, why the traditional DBMS system are not in state to handle this revolution. For this there are enormous reasons but the major are undoubtedly sheer size but beyond size the speed at which the data is following, different source of data with different formats and with increasing importance of data. So big data can be more appropriately related of 4V (most of the place you will only find 3V) i.e. 'Volume', 'Velocity', 'Variety' and 'Value', where volume is related to the increasing size of the data, velocity being rate at which data size are increasing, variety being different source and format of data and the changing nature of data stored to unstructured and semi structured data which is one of the biggest reason why traditional database system are no more capable of handling these data sets and value being high amount of information hidden behind these unstructured data which would have been throw away few years ago.


According to the reports of EMC2: The world’s information is doubling every two years. In 2011 the world will create a staggering 1.8 zettabytes. By 2020 the world will generate 50 times the amount of information and 75 times the number of "information containers" while IT staff to manage it will grow less than 1.5 times.

The key feature behind initial success of big data and its power to change the future is it can work with unstructured and semi structured data which the traditional data base system can't handle.


Few of the area of Big Data:- Big Science, web logs, RFID, sensor networks, social networks, social data (due to the social data revolution), Internet text and documents, Internet search indexing, call detail records, astronomy, atmospheric
science, genomics, biogeochemical, biological, and other complex and often interdisciplinary scientific research, military surveillance, medical records, photography archives, video archives, and large-scale e-commerce.

So does it mean Big Data Solutions will kick off the traditional data base system and data warehouses???

No data in RDBMS and Solution for Big Data are complement to each other like a person playing baseball one hand is needed for catching the ball and other for throwing the ball, single hand can't do the work.


This is challenge for us to combine features of previously existing system to provide a better solution to the current problem.

Since this technology is relatively new other challenges are also there like
  • Heterogenity
  • Scale
  • Timeliness
  • Privacy
  • Human Collaboration 
Apart from this there are enormous challenges there are many others all well which are of equal importance one is which I have already discussed already is the amount of IT professional will decrease.
According to the report of Mckinsey there will be requirement of 140,000 to 190,000 workers with knowledge of deep analytics and around 1.5 million data literate manager alone in US.  

The report of PCAST on Networking and Information Technology R & D identified big data as a "research frontier" that can "accelerate progress across a broad range of priorities."

This is where the upcoming IT and Data Professionals can build there future for smarter and better tomorrow...

Please Don't forget to leave your precious comments.