In the 1980’s, John Naisbitt wrote, “We have for the first
time an economy based on a key resource [information] that is not only
renewable, but self-generating. Running
out of it is not a problem, but drowning in it is.[i]” Little did Naisbitt know how much information
we’d be creating 30 years later. By some
estimates we are generating over 1 zettabyte (1x1021) per year[ii]. How do you avoid drowning in all that data,
and gain insights? That is the realm of
Big Data Solutions.
Center recently ran a
seminar on Big Data. We started off
talking about the ‘big data conundrum.’
The volume of data is growing so rapidly, that the fraction of data that
an enterprise can analyze is decreasing.
Because of this gap, we’re getting ‘dumber’ about our organization and
job over time. This is driving the need
for improved analytics and platform technology that can help us to process this
large volume of data.
What do customers want to do with big data? Popular requests we’ve heard include: I/T log
analytics, RFID tracking and analytics, fraud detection and modeling, risk
modeling, 360o view of a
person/place/thing, call center record analysis, and fusion of multiple
unstructured objects (e.g., pictures, audio).
Since we now collect so much data, the possibilities are only limited by
your imagination –and our ability to extract insights from the data.
In order to process these large volumes of data, special
systems and applications are being deployed.
Many of these are based on the Apache Hadoop middleware which supports a
distributed file system and processing environment for scalability,
flexibility, and fault tolerance. IBM’s
big data platform includes offerings based on Apache’s Hadoop with enhancements
to improve workload optimization, security, and cluster hardening. The IBM offering (BigInsights) also comes
packaged with advanced analytical capabilities for data visualization, text
analysis, and support machine learning analytics. One interesting item was the announcement
that the enhancements would be packaged to allow them to work with other Hadoop
distributions, such as the Cloudera™ hadoop.
Another offering discussed in the seminar was the Stream computing
offering designed to efficiently process “data in motion,” such as stock ticker
streams and social media feeds.
One of the biggest challenges given the huge volume of
information is finding the right information.
Governments, Utilities, and financial companies have this problem in
particularly because of the huge volumes they deal with. A recent IBM acquisition, Vivisimo, has
developed a next-generation search engine to provide search across multiple big
data and traditional platforms. Vivisimo
provides a scalable search application framework that can perform a federated
search across many different data sources including the web, social media,
content stores, and more traditional structured database systems. One feature that may be particularly
appealing to government agencies and corporate environments is its ability to
map individual access permissions of each data item, authenticate users against
each target system and limit access to information a user would be entitled to
view if they were directly logged into the target system.
They offer a clever search tool that provides easy
navigation and discovery, using both structured metadata (faceted search) and
keywords that the program dynamically discovers based on analysis of
unstructured content. Vivisimo provides an agile development layer, to allow
users to quickly create applications and dashboards to discover, navigate and
The seminar also featured a customer case study of using big
data for cybersecurity mission operations. IP traffic is growing at 29% CAGR, and with it,
the cyber-threats they are facing. Unfortunately, the customer’s headcount
isn’t growing, so more automated ways are need to detect and respond to threats. For this application, timeliness is key –
dealing with threats in real-time. To
identify potential threats, they want to be able to compare current threat and
traffic data to norms from the recent past, and similar periods in the
past. Their solution utilizes the
Netezza data warehouse appliance for near real-term data and IBM BigInsights
for long term storage. The solution eliminates
as many mundane “data retrieval” tasks as possible for the analyst, and provided
the analysts with those datasets that had a high probability of being
“interesting.” In this way, the solution helps the analyst deal with the
extreme data volumes, and yet remains flexible to the changing threat
Do you have an opportunity to use massive amounts of data to
accomplish a business/mission objective that can’t be done when we were limited
to small volumes of data? Do you have an
innovative solution? We’d like to hear
your stories about big data.
For more on the Big Data seminar, see our ASC website under past events.
[i] Naisbitt, John,
Megatrends: Ten New Directions Transforming Our Lives, NY Warner Communications
Company, 1982, pages 23-24
[ii] IDC Digital Universe