Across human history, the most successful decisions made in the world of business have been based on interpretation of available data. The use of computers to implement IT solutions has commonly been about big data since coming of age in the 20th century. Technologies supporting data warehouses, data mining, and business intelligence have been with us for years, and manufacturing applications have long used analytics to respond quickly to variances found by sorting through large volumes of rapidly arriving process data.
Here are 5 things to know about the importance of performance when processing big data.
1. Today and every day, 2.5 quintillion bytes of data is created.
In fact, 90% of the data in the world was created in the last two years alone. The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data. Some estimates for the data growth are as high as 50 times by the year 2020.
2. Studies have grouped the challenges of processing big data into four performance categories: volume, velocity, variety, and veracity.
Volume — big data solutions must manage and process larger amounts of data.
Velocity — big data solutions must process more rapidly arriving data.
Variety — big data solutions must deal with more kinds of data, both structured and unstructured.
Veracity — big data solutions must take measures to validate the correctness of these large amounts of rapidly arriving data.
3. The challenges for handling big data include capture, storage, search, sharing, transfer, analysis and visualization.
Given the sheer quantity and complexity of the data, traditional database management tools and applications simply cannot keep up. With this much data to process, it is mostly innovations in distributed processing that have provided dramatic increases in our big data capabilities. Scaling out has gone a long way towards making big data solutions both feasible and manageable from a performance and capacity perspective.
4. We are seeing examples every day around the world of how the capability to process big data in a high performance fashion can benefit us.
A major US retailer adds weather data to its distribution algorithms so it can model delivery paths and it can use disparate sources for improved logistics.
A major Indian telecommunications firm analyzes billions of call records daily to target customers for special offers, reducing churn and increasing loyalty among customers.
Police in a major US city have installed traffic cameras throughout the city, and these traffic cameras have the ability to read license plates. The Department of Motor Vehicles uses this data to identify stolen vehicles and get them off the street in real time, since many crimes are committed in stolen automobiles.
5. In IBM Smarter Planet terms, big data helps us to change the way the world works.
Big data solutions allow us to change how business is done in ways that were not possible just a few years ago by exploiting previously unused sources of information. Read more about it in the IBM Redpaper Performance and Capacity Implications for Big Data, REDP-5070.
Mike Ebbers is an IBM Redbooks Project Leader. He works with technical experts to create books, guides, blogs, and videos. Follow Mike on Twitter at @MikeEbbers.