Contestants on “Jeopardy!” have to understand the clues in
light of the Category (context) and then quickly sift through their accumulated
knowledge over their lifetime to come up with potential answers, decide if they
have confidence in their answer, and then quickly respond by hitting the
buzzer. Except for the buzzer, does
that sound familiar and relevant to your job? Would a computer that could do all that plus
cite the evidence backing up its answer be helpful to your job? If so, then read on.
At last week’s Analytics
Solution Center June seminar, David Ferrucci from IBM Research described
the IBM Research Grand Challenge of making a computer that can win at the game
of “Jeopardy!”(For more on “Watson,” the computer behind the challenge, check
New York Times Magazine cover article from last week)
Les Drieling, the keynote speaker and former US Government Intelligence
Agency senior scientist (now an IBM executive in the Global Business Services NISC
business unit), highlighted the needs in the intelligence community to sift
through very large volumes of structured and unstructured data with the goal of
making very important (some life and death) decisions under extreme time
pressure. It might seem obvious that the
DeepQA technology behind the “Jeopardy!”
project can be used to help intelligence analysts to filter and retrieve information using natural
But don’t many agencies have the need to retrieve
information quickly and precisely in response to questions? For example, the Coast Guard may have wanted
to know what dispersants are best for using in the open ocean to fight the Gulf Coast
oil spill, what evidence exists as to their efficacy and safety, and a
confidence value on the proposed answer.
The open government movement is also spurring use of this
sort of technology to help its citizens.
For example, a citizen could use it ask about the status of a new law or
who to contact with regard to a particular problem.
Let’s say the citizen’s question was “How to get my car
emissions tested when I’m away at college out of state and this state doesn’t
do emissions testing?” A question &
answer system would have to be able to decompose the query, search its database
for possible answers, and then select the best answer to return to the citizen.
If all the questions are known ahead of time, then government personnel can
develop a simple look-up system.
However, when there are so many questions that they can not all be
itemized ahead of time, then a more sophisticated approach is required. This is where the IBM DeepQA technology comes
in to play.
Text Analytics underlies DeepQA as well as the other presentations
at the June Analytics seminar. Another presentation showed how the National
Highway Safety Administration (NHSTA) Defects and Recalls database could use a
text analytics tool to alert on the possible connection between Toyotas and
rapid acceleration much earlier than this pairing came to the NHTSA, or the
public’s, attention. The tool used for
this demonstration was the IBM
Content Analyzer and it allows enables you to search, discover, and perform
the same analytics on your textual data that is done with structured data. In the demonstration it identified the
unusual relationship between “Toyota”
and “acceleration” – automatically.
Another example showed how an Intelligence Agency was using
text analysis to analyze the performance of its counter terrorism efforts. In this example, report text was analyzed and
scored to determine whether their objectives were being met and which intelligence
methods were most helpful in meeting their objectives.
Finally, IBM showed how text analytics could be used to
discover major themes that occurred when USAID (US Agency for International
Development) ran a “Jam” that asked participants from around the globe to
propose “pragmatic ideas and solutions to some very real issues and problems
facing our communities and our world today” (this is from the Global Pulse 2010
Website). The “Jam” tools quickly identified and classified the participant’s
key ideas in real time so that later participants could join the conversation
on their topics of interest.
The Category is “Government.” For $100, the clue is “Text Analytics.” <Buzz> “What can help the Government make better and
faster decisions from our mounds of unstructured data?”
To see the charts and listen to the replay from the seminar
go to www.ibm.com/ASCdc and look under
Give me your comments on how this technology can help you and
your agency or write to me at email@example.com.
- Frank Stein, Director of IBM's Analytics Solution Center, Washington, D.C.
As government leaders do you believe the world is getting
more complex? More volatile? If so, you’re not alone - - Sixty percent of
the CEOs surveyed by IBM in our 2010 CEO Study thought the world was getting
more complex, and even more, 69%, felt the world was getting more
For the first time, we also posed a similar set of questions
to college students. These future
leaders viewed the world as even more complex than the CEOs we surveyed. But
they saw less volatility, and significantly less uncertainty than the CEOs (65%
of the CEOs, but only 48% of the students).
Could it be that the students are more acclimated to economic boom/bust
cycles and feel more comfortable with the uncertainty of today’s world?
Or could it be that in the instrumented, interconnected,
collaborative world that they are used to (most of the students never knew a
world without web browsing and many don’t remember the pre-Facebook era), they
feel more comfortable dealing with this complex world? As a student in France put it, “We will have more
information, so it [the world] should be more predictable.”
We found that students who had the greatest sense of
complexity put much more emphasis on the analytics and predictive capabilities
of information. They were 50% more
likely to expect significant impact from increased information than peers who
did not have the same sense of complexity.
And they were 22% more likely to believe that organizations should focus
on insight and intelligence to enable their strategies. Also,
interestingly, students in China
were significantly more likely to prefer a fact- and research-based style of
decision making than their peers around the world. Does that indicate that the Chinese students
have been trained to feel more comfortable dealing with data than their
With the baby boom heading towards retirement in the coming
years, does this mean the government workers who replace them will be more
comfortable using information and analytical techniques to handle the world’s
problems? Or could it be that complexity
will always rise to be just beyond our ability to manage it with our current
level of technology?
Click here to see the IBM Report: “Inheriting
a complex world”
Click here to see the IBM Report: “2010 Global CEO
More on Analytics for Government here: www.ibm.com/ASCdc
Do you think our future leaders are inheriting a more
complex world? And do you feel they are
more prepared to manage it?
Comment on this blog or write to me at ASCdc@us.ibm.com
Frank Stein, Director of IBM’s Analytics Solution
Many of you probably saw the news about the Beltway Blockage
on July 8th
in the afternoon - - some of you may have been stuck in
the traffic like I was.
I had just read
IBM’s new Report, “The
Globalization of Traffic Congestion: IBM
2010 Commuter Pain Survey
,” but it was little consolation knowing that
traffic delays in Moscow were on average 2.5 hours, even as I watched my
commute time inch towards the second hour.
Transportation is a key governmental function that has
enormous impact on the citizens’ well being.
Traffic congestion adds stress to our lives, retards economic
development, and impacts the environment.
Performance Management is the mandate of the day for
governments, both federal and the state and local government. In the past, many government agencies would
measure performance such as the number of roads resurfaced, number of traffic
lights installed, and the number of dollars spent on transportation. These were input data elements. A more recent focus, and one that is more
meaningful to citizens, is to measure the outcomes achieved by the government
agencies. In the case of transportation,
an outcome might be the average commute time from one location to another, the
average speed on a roadway, or the volume of traffic (or persons) carried by a
road segment during the peak traffic hour.
Reporting on outcomes is but the first step. The performance achieved must be compared to
the desired quality of service (QoS).
Setting of the QoS goals for transportation and other government
functions is worthy of a public debate because there are invariably tradeoffs,
the major one being how much more one is willing to pay to achieve a better
Another step that can be done with the outcome data is to
determine trends and predict what might happen if the trends continue. We call this Predictive Analytics. We can plan the transportation infrastructure
that will be needed if Washington’s
growth continues at the current rate (except for 2008, we have grown
Additionally, the performance data can be analyzed to find patterns. Does the QoS fall short only in certain spots
or at a certain time of day? Why is this
happening? We can build models of the
traffic flows and run simulations to allow us to ask questions such as “Would
an extra off-ramp lane prevent the exiting traffic from backing up on the
Beltway?” Or “Would running an extra
lane Southbound in the morning improve the traffic flows?”
Getting back to the recent Tractor-Trailer accident, has
anyone done any modeling and simulation of what might happen if I-495 were
blocked by an accident - - or a terrorist action? Do we have alternate routes identified? Do we have the computer systems to redirect
traffic to these alternate routes and to dynamically change the traffic
patterns on certain roads to facilitate the flow in traffic in what may be
If you’d like to voice your opinion about the traffic
situation in your city, fill in our on-line questionnaire "Traffic Survey" Disclaimer:
This is not intended to be a scientific, randomized survey, and I make
no claim to its validity. However, I
will publish the results in a future blog, if we get enough interest in the
Give me your thoughts on how analytics might be used to
improve our traffic situation. Write to
me at firstname.lastname@example.org or respond to this
-Frank Stein, Director, IBM’s Analytics Solution
More on Analytics at our website www.ibm.com/ASCdc
The news last week was all about the weak job market.
Fed Chairman Ben Bernanke characterized the
job market as showing “continuing weakness.”
Well, guess what?
The job market
for those with Analytics skills is very hot.
Monster has over 1000 job listings for Business Analytics jobs.
Here at IBM, we have over 100 openings for Business
Analytics and Optimization jobs. Some of
these are associated with our Public Sector Practice, consulting to Federal,
State, and Local Governments or developing data-intensive, analytics solutions
to help them perform their mission.
Why are there so many jobs in this field? Businesses and governments today must figure
out how to do more with less.
Organizations can analyze data coming from their business processes to
develop new approaches to streamlining or even optimizing their business. In the past, many decisions involved in
running an organization were based on “gut instinct.” Today, it is not longer defensible to make
decisions in this way when it is possible to make “fact-based” decisions using
hard data. Data stored in a Business
Intelligence system can be used by every level of an organization to help staff
understand their business better, detect problems, and develop solutions that
will allow them to accomplish their mission better, cheaper and faster. Sophisticated analysis can be done on the
data to predict what will happen if the current trends continue, determine how
to achieve the best outcome, and study the impact of external uncertainties
such as the economy or the weather.
According to the Bureau of Labor Statistics in their
2010-2011 Occupational Outlook Handbook,
the employment of operations
research analysts is expected to grow 22 percent over the 2008-18 period. While not all analytics jobs require an
operations research degree, this gives a good indication of the long term
trend. We know that technology is continuing
to improve both in terms of raw compute power and in the design of efficient
algorithms to analyze and optimize solutions.
This increasing capability will drive the demand to add “smarts” to many
more systems and processes, and will drive the need for analysts who can apply
the technology. So analytics isn’t just
a good short term career choice, but a good target for long-term career
To do these jobs, though, requires in-depth skills and
knowledge. Skills in operations research
(OR) techniques, data mining, optimization, decision theory, and data analysis
are needed, along with some background in IT systems. The ideal candidate will also have some
domain knowledge about government or business functional areas since it is very
hard to apply the mathematical techniques in abstraction.
How to Find Analytics Jobs
Most Analytics jobs aren’t listed under “analytics” and many
won’t even come up under that keyword.
Use search terms like ‘business intelligence,” “performance management,”
“optimization,” and “operations
research.” If you have experience with
actual analytics software such as Cognos, SPSS, Intelligent Miner, or ILOG,
both Monster and IBM’s website return hits on those keywords.
Want to learn more about jobs at IBM in Business Analytics?
Go to www.ibm.com/employment
and click on the “Search for Jobs at IBM” link
You may also write me at ASCdc@us.ibm.com
Analytics Solutions Center of Washingtion, D.C. Director
on the weather and climate occurring around the world in
Because weather fascinates many
of us and is experienced by all of us, the report provides good examples of how
data can be analyzed, reported, and visualized.
The first observation I'd make is that the report focuses on
unusual or anomalous events. It tries to
put them in historical context. For
example, I learned that the U.S.
had the wettest October since records were collected 115 years ago. And Toronto
had a snow-free November for the first time in recorded history. In all data analysis - weather data,
financial data, or performance data – it is important to pull out the
significant events from the rest of the data, or the “noise” as we say. NOAA does this by comparing the past year’s
data with their historical data to find out where the year stood in comparison
to all the other years. Similar analysis
can be done by other agencies whether the metric is road-miles constructed or the percent of students receiving student aid that graduated. What is key is comparing the results in light of the historical data and trying to gain insights on what the trend is and what it means.
Because 2009 was the end of the decade, they have also
compiled some data at the decade level rather than at the year level. While the 2009 average global temperature was
the fifth warmest year on record, the 2000-2009 decade was the warmest on
record for the globe. And the decade
before, 1990 – 1999, was the warmest on record at that time. The use of multi-year averages is a good
example of smoothing that can be done to help ferret out significant
information and remove the year-to-year fluctuation in large collections of
time series data. The graph showing the
decade data makes the trend very obvious (Source NOAA report, chapter 2).
Many of the charts in the report show the yearly results as
a delta from the long term average, e.g. last year’s average surface
temperature was .5ºC above the 1961-1990 average using NASA/GISS data. By graphing
the time series data against the long term average, the anomalies standout. Other
charts show the actual values and it is possible to discern trends in the data.
For example, the lower tropospheric temperatures are increasing by
approximately .15ºC per decade. One can
use this information to predict the climate for future decades, which could
have value for policy purposes.
The report also highlights the very strong monthly and seasonal variability in the U.S. surface temperatures in 2009 that would be obscured if one just looked at yearly averages. Another analytical technique - Modeling - can be used to help analyze why the "why" behind the data. Why did 2009 show such strong variability? The report indicates that in 2009 the global climate switched from the La Nina conditions that dominated 2008 to El Nino sea surface temperature (SST) conditions in the tropical Pacific ocean. Was this the cause? NOAA global climate models were subjected to the Pacific SST observed data and the results are show below. While not all of the variability appears to be explained by the model, the warm first quarter over the Great Plains and cold summer seems mostly consistent with the impact of La Nina during the winter and El Nino during the summer.
The State of the Climate report shows good examples of many
data analysis techniques including historical analysis, near-real time
reporting, reanalysis of past data using newer, improved techniques, averaging
of multiple datasets to improve reliability, and drill down capabilities from
decades, to years, to seasons, to months,
and from global to regional to country and state. They also use
interesting visualization techniques.
Those interested in data analysis, as well as weather,
should download this report from the NOAA Website. (Arndt, D.S.,M.O. Baringer, and M.R. Johnson, Eds., 2010: State of the Climate in 2009, Bull. Amer. Meteor. Soc.). Note: While NOAA does use IBM Technology in its Research, the report does not state which technology is used in the reported climate studies and I don't intend to imply any relationship between this report and IBM.)
Those interested in further information on Analytics
including our fall schedule of events, please visit the Analytics Solution
Center website at www.ibm.com/ASCdc.
If your agency uses analytics in interesting and novel ways,
I'd like to hear from you. Please write to me at ASCdc@us.ibm.com.
Frank Stein, Director
Apparently, pretty good, according to Nucleus Research. They recently completed 2 ROI Case Studies of 2 government analytics projects. Both showed impressive results:
- Alameda Country Social Service Agency's Social Services Integrated Reporting System (SSIRS) had an ROI of 631% and a payback of 2 months
- Memphis Police Department's Blue CRUSH (Criminal Reduction Utilizing Statistical History) had an ROI of 863% and a payback of 2.7 months
The ROI calculations may even be conservative as Nucleus Research appears to assume that the agency and department will pay taxes on the annual benefits from the solutions.
The SSIRS system helped Alameda County reduce overpayments to non-compliant citizens, improve their win rates when claimants appealed discontinuation of benefits, and improved caseworker productivity. The system is essentially a Business Intelligence solution giving the caseworkers access to information about their clients, with dashboard and drill down capabilities. It also provides the caseworkers and managers with immediate information on "how am I doing?". Providing caseworkers with information on their clients' work participation rate and other performance metrics was key to improving the performance of the social service agency. The solution combined Cognos Business Intelligence, Infosphere Identity Insight, and an Infosphere warehouse to hold all the data. Identity Insight helps the caseworkers track the relationships between the various clients (e.g., parent/child) that may impact services offered. Here is a video where Don Edwards, Assistant Agency Director, talks about the solution: YouTube Video
The Blue CRUSH solution helped the Memphis Police Department (MPD) to identify crime "hot spots" and then target these areas for increased attention. As a result, MPD has reduced violent crime without additional staffing. The solution uses IBM SPSS Predictive Analytics software to analyze crime data pertaining to type of criminal offense, time of day, day of week, location, and the weather. The solution was developed with the assistance of the University of Memphis Department of Criminology and Criminal Justice.
Memphis Police Department received a National award from Nucleus
Research for this solution. They were one of only
ten companies and governmental agencies to receive the Nucleus
Research ROI award. Out of 350 technology projects that were
submitted, the Memphis Police Department was one of only two
governmental agencies to receive an award. The other governmental
agency was the US State Department.
If you'd like more information on these two case studies please contact me at email@example.com.
More information about Analytics, including our Fall Analytics Seminar Series,
can be found at www.ibm.com/ASCdc
Director, Analytics Solution Center
Sam Palmisano, Chairman and
CEO of IBM got together with Michael Dell, Chairman and CEO of Dell to release
an op-ed piece last week that the government can save $1 Trillion through the
use of IT.
for the statement.
Jeffrey Zients, Chief Performance Officer,
penned a blog
shortly thereafter titled, “Seeing Eye to Eye with the Tech CEO Council.”
Many of the
examples cited in the Palmisano/Dell statement relate to the use of analytics:
- Consolidating the government’s myriad supply
chains is likely to save $500 billion.
- Applying advanced analytics to reduce fraud
and error in federal grants, food stamps, Medicare payments, tax refunds
and other programs could save $200 billion.
- Using predictive
technology, New York
State is validating
tax refund requests and saving $889
million by catching phony refunds.
- Identifying suspicious Medicare activity using
analytics has shown North Carolina
how to save $25 million in just three months.
In addition to
helping to uncover fraud, waste, and abuse, I’d like to suggest 3 other ways analytics
can help the government to save money.
- Streamlining Processes: Analytics can help streamline and
optimize programs, reducing the costs of implementation while improving
service to citizens. For example,
IBM worked with Social Security to streamline their processing of
disability claims so that the majority of claims can be expedited with
little risk of allowing through unacceptable claims.
- Managing Performance: Performance management solutions can
help the management and staff of agencies to know their up-to-date
performance, and quickly spot and trouble-shoot performance issues before
they become major problems. Performance management can also help identify
successful approaches that can be replicated throughout and across
- Better decision-making: Analytics can help
agencies decide which programs to fund or the most effective
approach to take for a particular program.
By using modeling, simulation, and other data-driven approaches,
agency staff can make decisions that both save the tax payers’ money and
deliver the best results. For
example, by modeling and optimizing the US Postal Service transportation
network, USPS is able to increase utilization of assets and save hundreds
of millions of dollars.
I’d like to hear
your ideas for how agencies can save money through employing analytics. Write to me at firstname.lastname@example.org.
See our website for
further information on using analytics in government: www.ibm.com/ASCdc
Director, Analytics Solution Center
Watson is the only computer on the planet that can answer a Jeopardy!
question in less than three seconds - fast enough to be competitive with the
world’s best human players.
of you that missed the match click here
to see a video clip from the match.)
But can a Watson-like computer help the government?
Watson was optimized to tackle a specific challenge:
competing against the world’s best Jeopardy! contestants. It does this by sifting through large amounts of unstructured information to find potential answers and assigning a confidence measure to each potential answer. When it has high confidence in an answer, it will buzz in and offer the answer. Beyond Jeopardy!,
IBM is working to deploy this technology
to businesses and governments dealing with the information overload
problem. At work, few of us are like
Ken Jennings, able to instantly answer almost every question thrown at us - -
with an 80-90% success rate. There is
simply too much information and more information is coming in all the
time. Whether we’re in finance, HR, IT,
or another area, our success at work depends upon dealing with huge volumes of
information, sifting through it to find
the “good information”, and then using the information to make decisions to do our
job. Technology like that used in Watson can provide for our consideration potential answers as well as the "evidence" it used to come up with potential answers.
In discussions recently with some of our military colleagues,
they came up with numerous ideas for deploying Watson-like technology. They cited the problem of “request overload” - - dealing with all the
requests for Predator and similar UAV missions.
How could they deploy their limited resources to best effect? Another person mentioned the problem of
sifting through all the intelligence information – most of it in the form of
unstructured information formats such as video and text – to find the relevant
information to a mission they were planning.
Another discussed the problem of monitoring their “situational
awareness” and how hard it was to keep track of all the data coming in. “Could Watson help monitor our security
posture and alert us to potential threats?” asked another.
Are you dealing with massive amounts of information? How could a Watson-like system assist you at
work? Do you want to recruit Watson to
work for your agency? We want to hear
your thoughts either in this blog or directly.
Write to me at email@example.com.
We’re hosting 2 free Watson Overview Briefings
on July 26 and 27. More
information at our website: www.ibm.com/ascdc
Frank Stein, Director, Analytics Solution Center
In these tough fiscal times, all agencies are going to be
focusing on doing more with less. How
does one get more done with less budget and staff? Consider turning to Analytics.
The consulting firm Nucleus Research has been looking at the
Return on Investment (ROI)
for various types of IT projects.
According to David O’Connell, Principal Analyst at Nucleus Research, “projects
involving analytics have some of the highest ROIs of any projects studied.”
Nucleus Research recently studied an analytics project IBM performed at DC
Water, the local water authority for Washington,
DC. In 2008, IBM began a first of a kind project
using advanced analytics to create a smarter water system that analyzes data on
valves, storm drains, service vehicles, truck routes and more to optimize its
infrastructure. With some pipes and other assets that date to the Civil War,
maintaining high levels of service while replacing older infrastructure is an
The project has resulted in the following benefits from a combination of IBM
Asset Management and Analytics technology and services:
Field Services trucks can be automatically
routed to optimize work management. This results in more work orders being
completed each week, as well as up to 20 percent reduction of fuel costs
related to fewer truck rolls and reduced "windshield" time.
Revenue loss from defective or
degrading water meters allowed recapture of $3.8 M because the analytics behind
the advanced metering infrastructure delivers more timely identification and
replacement of those meters. Revenue was
also recaptured because DC Water can now identify and bill locations where
there is unmetered water usage.
DC Water has been able to identify
assets most critically in need of repair using predictive analytics, so aging
infrastructure replacement programs can be more accurately scheduled,
preventing costly incidents that reduce service quality, such as outages and
water main breaks. This reduces both
maintenance labor costs and call center
costs associated with emergency incidents.
Nucleus Research reported in its case
study that the DC Water project resulted in $19.677 M of benefits over 3
years with a cost of $883 K, giving an ROI of 629%.
In 2010, Nucleus Research studied a number of other public
sector analytics projects. The results
from these projects are shown in the chart below. On average, the analytics projects have
resulted in an ROI of almost 600%! This
means that over 3 years, the projects have returned benefits 6 times the
original cost of the projects. The
payback period has been less than a year in all cases. This is important to government agencies because
it means you can see cost savings in the same fiscal year that you invest in an
According to David O’Connell, Principal Analyst at Nucleus
Research, “When government entities adopt
analytics, returns are high for two reasons.
First, waste such as leaky water mains, defective meters, or benefits
overpayments can be identified and eliminated.
Second, by making information more readily available, employees spend
less time looking around for information and more time getting their jobs done.” O’Connell went on to say, “Another improvement is better use of
workers’ time. The more an organization
knows about the public it serves, their needs, and the means of delivering
service, the smarter managers’ decisions are when they hand out workers’
Has your agency implemented any analytics projects? What’s been your experience?
Don't feel comfortable sharing
publicly? I'd be happy to hear your thoughts directly as well (firstname.lastname@example.org).
(net savings year 1 + net savings year 2 + net savings year 3)/3 * 100
At the end of the Superbowl, people created 12,233 tweets per second. And it turns out that was less than half the
number of tweets created in Japan
on December 9th, when 25,088 tweets per second were recorded about
the Castle in the Sky anime movie.
Which, according to the Chinese, is nothing compared to the 32,312
messages per second sent on their twitter-like Sina Weibo system during the
beginning of the Chinese new year.
Within the government space, we’re no strangers to our own Big Data. Whether you’re in the DOD or NASA, the IRS or
SSA, you’ve got your own Big Data to deal with.
Last week, Forrester Research released a report that should help those in
government understand the Big Data Market.
It is called “ The Forrester Wave™: Enterprise Hadoop Solutions, Q1 2012,
(February 2, 2012)” report. IBM Technologies evaluated were IBM InfoSphere
BigInsights (IBM’s Hadoop-based offering), and IBM Netezza Analytics. In this
evaluation, IBM was placed in the Leaders category of the Wave and achieved the
highest possible score in both the Strategy and Market Presence segments. In
the third segment, Current Offering, IBM received the second highest score. You
the complete report here.
The report by analyst James
Kobielus states, “IBM has the deepest Hadoop platform and application portfolio.”
The IBM Analytics Solution
Center in Washington, DC
also focused on how to handle Big Data at its January 19th
seminar. The seminar covered various
aspects of Big Data including data-in-motion processing software, Hadoop
software, SONAS (scale out network attached storage), and the Netezza data
1. Big Data in Motion
back to the Tweeting, if you’re a government agency and you need to get
actionable insights into 10s of thousands of tweets per second which might be
about an unfolding crisis, how would you do it?
InfoSphere Streams is unlike anything else in the market in its ability
to ingest, analyze and act on data “in motion” – that is, data is processed and
analyzed at microsecond latencies.
2. Hadoop Big Data
is an open source codebase supported by the Apache software foundation. It is designed to process large volumes of
unstructured data. For example, if a government agency wanted to analyze months
of tweets or documents in non-real time, the Hadoop distributed file system
would be a good choice. The enterprise
class IBM Hadoop-based offering, BigInsights, is designed with system
management, security, and performance features that go beyond what is available
in the open source. It provides the
ability to analyze and extract information from a wide variety of data sources,
and promotes data exploration and discovery.
Attached Storage, or NAS, has become a very popular way to provide storage
within an organization. However NAS has
a number of limitations when dealing with
Big Data including the number of objects (files) it can support, support
for very large files, the i/o bandwidth
it can deliver to applications, and fragmented data management across multiple
systems. The IBM SONAS system is
designed to overcome these limitations and look like a very large virtual
system to the applications.
4. Data Warehouse Appliance
data warehouses when used for large volumes of structured data can be costly to
operate and maintain, and can be very slow when used for sophisticated
analysis. The Netezza appliance is a
dedicated device requiring no tuning or storage administration and with special
hardware chips to accelerate the performance of advanced analytics.
Want to learn more?
- More details on the topics can
be found at the ASC Website under
- On the educational front, we
provide free online training through BigDataUniversity.com. To
date, more than 13,000 students have registered for courses on Hadoop,
cloud computing and more.
We are working with a broad range of clients to help them define
their big data strategies. We look forward to working with you on your Big Data
The Forrester Wave™: Enterprise Hadoop Solutions, Q1 2012,
Forrester Research, Inc., February 2, 2012. The Forrester Wave is copyrighted
by Forrester Research, Inc. Forrester and Forrester Wave are trademarks of
Forrester Research, Inc. The Forrester Wave is a graphical representation of
Forrester's call on a market and is plotted using a detailed spreadsheet with
exposed scores, weightings, and comments. Forrester does not endorse any
vendor, product, or service depicted in the Forrester Wave. Information is
based on best available resources. Opinions reflect judgment at the time and
are subject to change.
In the 1980’s, John Naisbitt wrote, “We have for the first
time an economy based on a key resource [information] that is not only
renewable, but self-generating. Running
out of it is not a problem, but drowning in it is.[i]” Little did Naisbitt know how much information
we’d be creating 30 years later. By some
estimates we are generating over 1 zettabyte (1x1021) per year[ii]. How do you avoid drowning in all that data,
and gain insights? That is the realm of
Big Data Solutions.
Center recently ran a
seminar on Big Data. We started off
talking about the ‘big data conundrum.’
The volume of data is growing so rapidly, that the fraction of data that
an enterprise can analyze is decreasing.
Because of this gap, we’re getting ‘dumber’ about our organization and
job over time. This is driving the need
for improved analytics and platform technology that can help us to process this
large volume of data.
What do customers want to do with big data? Popular requests we’ve heard include: I/T log
analytics, RFID tracking and analytics, fraud detection and modeling, risk
modeling, 360o view of a
person/place/thing, call center record analysis, and fusion of multiple
unstructured objects (e.g., pictures, audio).
Since we now collect so much data, the possibilities are only limited by
your imagination –and our ability to extract insights from the data.
In order to process these large volumes of data, special
systems and applications are being deployed.
Many of these are based on the Apache Hadoop middleware which supports a
distributed file system and processing environment for scalability,
flexibility, and fault tolerance. IBM’s
big data platform includes offerings based on Apache’s Hadoop with enhancements
to improve workload optimization, security, and cluster hardening. The IBM offering (BigInsights) also comes
packaged with advanced analytical capabilities for data visualization, text
analysis, and support machine learning analytics. One interesting item was the announcement
that the enhancements would be packaged to allow them to work with other Hadoop
distributions, such as the Cloudera™ hadoop.
Another offering discussed in the seminar was the Stream computing
offering designed to efficiently process “data in motion,” such as stock ticker
streams and social media feeds.
One of the biggest challenges given the huge volume of
information is finding the right information.
Governments, Utilities, and financial companies have this problem in
particularly because of the huge volumes they deal with. A recent IBM acquisition, Vivisimo, has
developed a next-generation search engine to provide search across multiple big
data and traditional platforms. Vivisimo
provides a scalable search application framework that can perform a federated
search across many different data sources including the web, social media,
content stores, and more traditional structured database systems. One feature that may be particularly
appealing to government agencies and corporate environments is its ability to
map individual access permissions of each data item, authenticate users against
each target system and limit access to information a user would be entitled to
view if they were directly logged into the target system.
They offer a clever search tool that provides easy
navigation and discovery, using both structured metadata (faceted search) and
keywords that the program dynamically discovers based on analysis of
unstructured content. Vivisimo provides an agile development layer, to allow
users to quickly create applications and dashboards to discover, navigate and
The seminar also featured a customer case study of using big
data for cybersecurity mission operations. IP traffic is growing at 29% CAGR, and with it,
the cyber-threats they are facing. Unfortunately, the customer’s headcount
isn’t growing, so more automated ways are need to detect and respond to threats. For this application, timeliness is key –
dealing with threats in real-time. To
identify potential threats, they want to be able to compare current threat and
traffic data to norms from the recent past, and similar periods in the
past. Their solution utilizes the
Netezza data warehouse appliance for near real-term data and IBM BigInsights
for long term storage. The solution eliminates
as many mundane “data retrieval” tasks as possible for the analyst, and provided
the analysts with those datasets that had a high probability of being
“interesting.” In this way, the solution helps the analyst deal with the
extreme data volumes, and yet remains flexible to the changing threat
Do you have an opportunity to use massive amounts of data to
accomplish a business/mission objective that can’t be done when we were limited
to small volumes of data? Do you have an
innovative solution? We’d like to hear
your stories about big data.
For more on the Big Data seminar, see our ASC website under past events.
[i] Naisbitt, John,
Megatrends: Ten New Directions Transforming Our Lives, NY Warner Communications
Company, 1982, pages 23-24
[ii] IDC Digital Universe
My work this year has taken me from Big Data and Analytics towards Cognitive Computing and what IBM is now dubbing Cognitive Businesses (or Cognitive Government in our case). Cognitive businesses leverage cognitive computing technology (think Watson) to enhance, scale, and accelerate the expertise of their personnel. Below is the summary of the first part of a symposium I co-chaired last week. I'm happy to answer any questions you may have.
The AAAI Fall Symposia on November 12-14 included tracks on 1) AI for Human-Robot Interaction, Cognitive Assistance, Deceptive and Counter-Deceptive Machines, Embedded ML, Self Confidence in Autonomous Systems, and Sequential Decision Making for Intelligent Agents. This post will provide my general impressions of the Cognitive Assistance symposium.
Jerome Pesenti, IBM VP of Watson Core Development, provided the 1st day keynote. He started with the great quote from Fred Jelinek (Cornell/IBM/JHU) that “Every time I fire a linguist, the performance of the speech recognizer goes up.” He then talked about how deep learning is allowing reco systems that approach or surpass human performance. This led to a lively discussion with the audience on the universality of learning algorithms and whether the machines were learning in the same manner that humans learn something (no). Jerome finished with some applications of Watson including the Oncology Advisor, citizen support (e.g, tax questions), and security (finding relationships between data).
The rest of the morning was filled with examples of cognitive assistance for legal tasks such as filing a protective order (Karl Branting) and human-computer co-creativity in the classroom(Ashok Goel), and a tool to help SMEs define their vocabulary to find the most relevant content on the web (Elham Khabiri).
During lunch, much of the symposium had lunch together and a lively discussion ensued on cognitive assistance. One topic that I found interesting was on ultimate chess where human-machine teams compete. While these teams in the past have beaten computer-only teams, Murray Campbell noted that the advancements in chess playing computers are decreasing the value-add of humans to the team.
The afternoon session of Day 1 started with 2 interesting talks on cognitive assistance for helping those with cognitive disabilities. Madelaine Sayko described Cog-Aid which would include a cognitive assessment, recommender system (based on the assessment) and an intelligent task status manager for starters. Then Daniel Sontag described the Kognit technology program which includes tracking dementia patient’s behavior using eye tracking and mixed reality displays to assist the patient perform activities in daily living. Kevin Burns presented a sense-making approach that could be used by an intelligence analyst to help understand and define the Prior and Posterior probability calculations as new evidence is added. This could eventually be embodied into a cognitive assistant. Next came a presentation on capturing cybersecurity operational patterns to facilitate knowledge chaining by Keith Willett.
The final session of the day was a panel discussion of workforce issues associated with cognitive assistants led by Murray Campbell. Erin Burke of Fordham University Law School talked about how legal education must transition and that she is working at the intersection of law, big data, and cognitive computing. Jim Spohrer, Director of IBM’s University Programs, provided some predictions including that by 2035 everyone will be a manager and will have at least one Cognitive Assistant working for them. A lively discussion ensued with the audience about our forthcoming relationship with Cogs including whether we could trust them, unintended consequences, whether we can build common sense into a Cog, and whether our brains will atrophy as we depend on Cogs.
I’ll cover Day 2 in the next blog post.