Is BIGDATA a hype around us or are we already into it. Lets find more about this.
Lets go to history of computers . Why was computers actually used , the answer is to do simple calculations. Then came the need or desire to save data . So computers are made to have the capability to store data . Many applications developed on computers started using the capability of this new found ability to store data.
Use of computers increased with the innovations in hardware and decrease in cost of hardware. New programming languages ,tools emerged which helped in building software . Use of computers in all the fields like retail , healthcare , automobiles to store their inventory , employee , sales and other important details increased. Thus paving way for data storage systems like RDBMS. This was successful till recently. Basically our RDBMS store data in tables and the data is structured.
Over the years volume of data has increased enormously from few gigabyte to terabyte to petabytes . With the invent of new media social networking sites like Facebook , Twitter , Google+ new kind of data in the form of posts , images , videos are produced every second . The new data that is found is unstructured so our traditional database systems finds it difficult to handle. Companies started to track the internet user foot prints to analyze information about users.
Every day, we create 2.5 quintillion bytes of data — so much that 90% of
the data in the world today has been created in the last two years
The huge amount of data gathered provide some valuable insights into the history of organizations , interests of the consumers etc.. which provide some valuable information for the management to make strategic decisions.
The question that arises is how to use the huge amounts of data ?
Google did a research on it and came with a solution called MapReduce algorithm. MapReduce is used by Google and Yahoo to power their websearch. MapReduce was first described in a research paper from google.
The algorithm is basically developed on the concept of parallel processing. Parallel processing is use of multiple computers to compute the desired result instead of relying single powerful computer . MapReduce algorithm contains of two parts . As the name suggests it is "MAP" and "REDUCE". Let me explain more on this algorithm in a different blog.
As other company's will find it difficult implementing the algorithm for their problems and it is also like reinventing the wheel. There came a implementation of the algorithm called HADOOP .It is open sourced under Apache Foundation. The name itself is some thing unique. Try to guess what it could actually mean............
A TOY ELEPHANT
Yes you read it right it is the name of the toy elephant. The project was named after that . Doug was one of the core member of the project and HADOOP was his kid toy elephant name. Hadoop works over HDFS which is Hadoop file system.
Lets discuss more about Hadoop and HDFS in coming blogs.
Note: BigData investments are growing enormously. And need for developers are also increasing.