A BIG problem
If you don't have a big data problem yet, you have not arrived yet. Every organization worth its salt is now grappling with huge volumes of data which it cannot discard nor can it can ignore. There is a need to effectively and efficiently mine through this data for getting nuggets of information. The problem is indeed, BIG. The solution is IBM InfoSphere Big Data. IBM Big Data has in its portfolio, two products - InfoSphere Streams & InfoSphere BigInsights. Together they tackle the big data aspects - 3V - Velocity, Volume and Variety.
Over the next few series in this blog, we will introduce and present to you the various use-case scenarios, configurations, tools, developer assists that can help your way through understanding IBM Big Data.
The Three Vs
Velocity refers to the low-latency, real-time speed at which analytics needs to be applied. Examples of monitoring and analyzing such information includes weather, traffic, trading, critical healthcare - any system where there is continous feed from different sources, and the analytics feedback needs to be looped back into the sources of information for better monitoring.
Volume refers to "internet scale". Until about 6 months back, before I heard about big data, the largest byte I personally knew was only Giga. Now we hear the terms petabytes, terabytes, quintillion bytes - I've to pause a bit to tell you the number of zeros following the one. Sifting through this haystack for your needles is going to one a huge task.
Variety - Data is in all sorts of forms all over the place - while you have about 10% of data in neat structured rows of the organization RDBMS, the rest of 90% digital data that you might still be interested in for analytics, is out there in the form of images, voice, email, annual text reports, logs and in the last decade - blogs, tweets and posts.
What is BigInsights?
Take the power of open source technologies such as Hadoop, Lucene, Pig, Hive, Flume, HBase, Oozie, Avro, a Zookeeper to take care of all these. Then, facilitate the installation, integration and administration. Add a dollop of enterprise capabilities by adding the power of enterprise such as Security, UI tools for administration etc., Eclipse based IDEs. Then, finally pepper with components such as BigSheets, BigIndex, Text Analytics, JAQL, etc. And you get the powerful and versatile BigInsights solution in the end.
What is Streams?
Depending on your case scenario, you can build SPL (Streams Processing Language) Applications using the Streams studio IDE and deploy them on the Streams instance for ingesting and processing information from thousands of sources for continuous and fast analysis to enable the business user to make quick decisions. You can also make use of various ready toolkits aligned to each industry such as financial, healthcaes etc.
So why would you want to analyze all this data? How will you what solution is pertinent to your scenario? Check for the next entry in this blog.
Images acknowledgement : free-extras.com, aslimasti.com ,blogs.technet.com,