I gave the keynote address to George Washington University’s DATA Conference on December 2. This is what I told the students. Please reply with your thoughts and ideas to extend the conversation on how to make the world a better place through data.
Think about how you can use data and data science to make the world a better place. We are now in a unique time in history because we now have huge amounts of data being collected by all the digitized systems in the world (almost 1 ZB or 1 times 10 to the 21st power Bytes) and the Data Science techniques are becoming more powerful and easier to use. These two factors will give you the ability to do more to improve the lives of your fellow students, their professions and society at large than has ever been possible before.
Data Science innovation will be central to solving humanity's grand challenges by capitalizing on this unprecedented quantity of data now being generated on human behavior and attitudes, human health, commerce, communications, migration and more. You can help to accelerate and advance the development and democratization of Data and Data Science solutions that can address specific global challenges related to poverty, hunger, health, education, the environment, and others.
To help stimulate your imagination, I will present several examples from our work at IBM. The key is to combine your growing expertise in Data Science, with your passions. At IBM, we are encouraging students to combine Data Science studies with other disciplines, such as natural science, social sciences, healthcare, etc. - - the problem domains where the Data Science can be put to work.
For the first example of “Doing Good”, I’d like to tell you about IBM Fellow Chieko Asakawa. She became blind at the age of 14, and as a result has devoted her professional life to building solutions to allow her and other blind people to access the world and regain their independence. Chieko has developed an object recognition solution so she can “see” ordinary objects in her home and at stores, and allow her to pick out wine or know the directions on a package – all using machine learning. She has also developed an indoor navigation system that helps her to easily get from place to place at work. Both use smartphones as the user interface. See these links for more details on Chieko’s inventions: Image rec: https://www.youtube.com/watch?v=RNp4OpToAdQ (many interesting solutions, Chieko’s is featured at minute 17); Nihonbashi Tokyo NavCon: https://www.youtube.com/watch?v=mlGcutE2t2A ; TED talk: http://www.ted.com/talks/chieko_asakawa_how_new_technology_helps_blind_people_explore_the_world ).
The second example is from IBM’s Cognitive Build Competition. Two IBM employees, Karibi and Jenn proposed and prototyped a solution to help children with Autism. The solution, dubbed Pino after Karibi’s newphew, uses Watson Conversation service to help children with autism communicate more independently by providing real-time verbal prompts. It can also be used with other conditions that affect communication ability, such as stroke and Alzheimer's disease. I met Jenn a few weeks ago. She told me, “At a birthday party a couple of years ago, I saw how upset my son was when he didn't receive a cupcake because he couldn't say "yes" when offered one. He needs a therapist or caregiver to prompt him to answer basic questions. He has a communication device that can help him speak, but it requires him to know he needs to respond… When Cognitive Build started, I thought it would be great if my son's communication device could be cognitive so it could help him to be more independent when I'm not around.” Learn more at this link: https://medium.com/cognitivebusiness/addressing-autism-project-pino-3741ce13d39
The third example is about the opioid epidemic, which has become one of the worst health crises in US history. In 2015, more than 90 Americans died every day from opioid overdoses, a number comparable to deaths in car accidents and projected to have risen further in 2016 and 2017. The Centers for Disease Control and Prevention (CDC) estimate the total economic burden of prescription opioid abuse to be $78.5 billion a year, including healthcare costs, lost productivity, and criminal justice involvement.
For many addicts, the problem often begins with legitimate healthcare treatment in which opioid painkillers are first prescribed, such as for surgeries or chronic back pain. During treatment, some patients become addicted and go on to suffer the well-documented consequences of addiction, while others do not, even if they become long-term users. To combat the epidemic, it is vital to understand the exact circumstances under which medically sanctioned treatments can devolve into addiction. That’s where data science comes in to play.
This summer, we took the first steps in tackling this question in a project within our Science for Social Good program. The team, led by Bhanu Vinzamuri, focused on analyzing the relationship between factors surrounding an initial opioid prescription and a subsequent diagnosis of addiction. We found that those people that received initial prescriptions for more than 7 days has a significant correlation to Long-Term usage, as does use of Synthetic Opioid prescriptions. We also confirmed that days of supply matters much more for addiction than quantity (e.g. in milligrams of morphine equivalent) prescribed per day. Other factors that were positively correlated with long term use and which should be used by doctors when prescribing opioids were age, certain regions of the country, rural location, healthcare utilization and depression, osteoarthritis, or diabetes. See more projects at http://www.research.ibm.com/science-for-social-good/#projects
Because of the power that Data Science and data is bringing to Humans, we need to be sure it is a force for good and not for evil. IBM and XPRIZE Foundation believes Artificial Intelligence (and the data science algorithms it uses) will be central to solving humanity's grand challenges. Solutions to pressing problems related to health and wellbeing, education, energy, environment, and other domains important to humanity can potentially be found by capitalizing on the unprecedented quantities of data and recent progress in emerging AI technologies. That’s why IBM is putting up $5 million for the Watson AI XPRIZE. See https://ai.xprize.org/ for more details.
But even if you are not up for competing for the AI XPRIZE, there is lots that you can do. Find a societal problem that you are passionate about. It all starts with a problem or need, like Chieko’s blindness, or Jenn’s child with autism, or the opioid crisis. Then come up with an idea or approach. There is a lot of data now available. Our Data Science Experience is out there on the web for you to play with. It is designed to allow data scientists, business analysts, stakeholders, and programmers work together on a data project. It’s easy to use. Go out and try it. There are tutorials to guide you. It is at https://datascience.ibm.com/ . Don’t just study the problem and write a school paper, create a solution that helps people. Your university’s office of entrepreneurship can help you to build a business case for your solution. Finally, consider pitching your idea to one of the many Pitchfests that are around. One I’m familiar with that exposes your ideas to corporate sponsors such as IBM is NCET2. They are at https://ncet2.org/. Go ahead and make the world a better place!
In medieval times, Alchemists hoped to convert base metals
into the noble metal gold through the use of a Philosopher's Stone.
Today, in the field of information science, we talk about
Information Alchemy, converting data into information and then into
knowledge. Some people even add a 4th
stage of converting knowledge into wisdom[i], but
that will be for another blog post.
Data is defined as the raw characters or numbers, whereas information is
defined as the processing of that data into various relationships so they have
some meaning. Dr. Eisenberg at the University of Washington describes knowledge as the
“collected, combined, organized, processed information for a purpose.” Over time, it is thought that accumulated and
refined knowledge leads to Wisdom.
This year, the total of all digital data created is forecast
to reach close to 4 Zettabyes, or 4x 1021, according to IDC[ii]. This is nearly four times the 2010 volume and
it is growing rapidly. All of this data
should let us make a smarter and better planet.
However, today we’re drowning in all this data because we don’t have the
time as individuals to process all this information, and we don’t have computer
systems that can turn this data into insight,
But soon that will change.
We are entering a new era in computing which IBM is calling Cognitive
Computing. The first of these systems is
the IBM Watson system which debuted on the Jeopardy! Show 2 years ago. Traditional computing systems have done a
great job with handling data, including storing it and manipulating it into
information. So now we have lots of
financial, inventory, customer, and all sorts of other, mostly numerical,
We also have lots of unstructured information such as text,
audio, graphics, and video. We used to say that 80% of the new bytes being
created today were associated with unstructured data, but that number is
probably closer to 90% given all the video being created these days. This text and multimedia information is
human-readable – in fact, it is designed by humans for humans to understand but
is not easily understandable by today’s computers.
And that is a considerable problem. Today, the transformation of information into
knowledge is primarily done in people’s heads.
Not just by scientists, engineers, or financial analysts, but by
everyone who reads an article or watches a video. The time available for people (some would
say skilled people) to analyze information to gain insights (knowledge) is the
limiting factor in the production of new knowledge today. To say this another way, we are now
information-rich, but knowledge-poor.
The goal of the cognitive computing efforts is to remove
this limitation by designing computer systems that can take this abundance of
information, much of it in human readable/viewable formats, and convert into
knowledge. For example, in the Jeopardy!
IBM Challenge, the Watson computer system analyzed its deep information stores
to find the answer that best answered the clue and the category. It did this feat by utilizing many different
algorithms to attempt to “understand” the text information and a machine
learning (artificial intelligence) scoring system to select the best response.
In a more significant effort, IBM is working with Memorial
Sloan-Kettering and WellPoint (a major BC/BS licensee) to use cognitive
computing technology to assist doctors by helping to identify individualized
treatment options for patients with cancer. It is, in effect, creating knowledge of the
appropriate treatment options from information about the patient’s condition
and medical history, and information from clinical trials and best practices on
While the field of cognitive computing is just beginning, I believe
over the next several years, we will learn how to perform “Information Alchemy”
and we’ll see how this newly created knowledge can benefit our organizations
and our lives.
As the quintessential information-based organization, government agencies may be in the biggest need for "information Alchemy." Do you seen this need? Do you see opportunities for Cognitive Computing at your agency?
Director of IBM’s Analytics
[i] Eisenberg, Mike,
“Information Alchemy: Transforming Data and Information into Knowledge and
Wisdom”, March 30, 2012, http://faculty.washington.edu/mbe/Eisenberg_Intro_to_Information%20Alchemy.pdf
Derechos, Droughts, Hottest July on Record, Shattered
High Temp Records, Greenland Ice Sheet Melts. Just what is going on with the weather these
days? Is this weather really abnormal or
does it just seem to be that way? Is this part of a trend? Does global climate change mean we’ll have
more of these extreme weather events? Being
a data and analytics person, I started looking to see what data analysis had
been done on this subject.
The US Climate Extremes Index[i] provides
a measure to track the occurrence of extreme data (although it doesn’t take
into account Derechos and other severe wind events). The trend of the index (smoothed) has been on
the rise since 1970 and now is at an all time high, as shown below. The Index
was at a record high 46% during the January-July period, over twice the average
value, and surpassing the previous record large CEI of 42% percent which
occurred in 1934. Extremes in warm
daytime temperatures (83 percent) and warm nighttime temperatures (74 percent)
both covered record large areas of the nation, contributing to the record high
year-to-date USCEI value.
This index is
compiled by combining measurements throughout the country (1,218-station US Historical Climatology Network)
that show the percentage of the country impacted by extreme weather in terms of
maximum temperatures much above or below normal, minimum temperatures
above/below normal, percentage of country in severe drought/severe moisture
surplus, percentage of the country with a much greater than normal proportion
of precipitation derived from extreme 1 day events, and the percentage of the
country with a much greater than normal number of days with
The U.S. Global
Change Research Program in 2009 published a study which documented the changing
climate and its impact on the United
study uses 3 standard forms of data analysis: 1) reports on observations, 2)
predictions based on the observed trends, and 3) modeling to better predict future
climate changes based on various assumptions about the amount of heat-trapping
gases in the atmosphere. While the first
two types are based on large quantities of collected data, they use only U.S.
observations. The modeling, however,
must be done on a global basis which substantially increases the amount of data
that must be crunched.
Here are some of the findings as they relate to extreme
Overall Warming of the Climate
Temperatures, on average, in the1993-2008 period are 1-2ºF
higher than in the 1961-79 baseline. By
the end of the century, the average U.S. temperature is projected to
increase by approximately 7-11ºF under a high emissions model and by
approximately 4-6.5ºF under a lower emissions scenario. The temperature observations show that there
has been an increase in warmer and more frequent warm days and warm nights, and
warmer and less frequent cold days and cold nights in most areas.
More intense, more frequent, and longer-lasting heat waves
In the past several decades, there has been an increasing
trend in high-humidity heat waves, characterized by extremely high nighttime
temperatures. Parts of the South that
currently have about 60 days per year with temperatures over 90ºF are projected
to experience 150 or more days a year above 90ºF under a higher emissions
scenario. In addition to occurring more
frequently, at the end of this century these very hot days are projected to be
about 10ºF hotter than they are today.
Increased extremes of summer dryness and winter wetness with a generally
greater risk of droughts and floods.
Trends in drought have strong regional variations. Over the past 50 years, with increasing
temperatures, the frequency of drought in many parts of the West and Southeast
has increased significantly. Models show
that the Southwest, in particular, is expected to experience increasing drought
as the dry zone just outside of the tropics expands northward with global
Precipitation coming in heavier downpours, with longer dry periods in
While average precipitation over
the nation as a whole increased by about 7% over the past century, the amount
of precipitation falling in the heaviest 1% of rain events increased nearly
20%. One of the outputs of the climate
modeling is to project the probability of certain events. For example, heavy downpours that are now a “1
in 20 year occurrence” are projected to occur about “once every 4-15 years” by
the end of the century. These heavy downpours are expected to be
10-25% heavier by the end of the century than they are now. This will likely cause more flooding events
(flooding depends both upon the weather and the susceptibility of the area to
More intense but fewer severe storms
Reports of severe weather such as
tornadoes and severe thunderstorms have increased during the past 50 years.
However the climate study indicates that much of this may be due to better
monitoring technologies, changes in population areas, and increasing public
awareness. Climate models do project an increase in the frequency of
environmental conditions favorable to severe thunderstorms. But the report notes, “the inability to
adequately model the small-scale conditions involved in thunderstorm
development remains a limiting factor in projecting the future character of
severe thunderstorms and other small-scale weather phenomena.[iii]” Advances in modeling and big data analytics,
as well as improved monitoring networks are likely to reduce this limitation in
The June Derecho that hit the Washington metropolitan
area shows an example of the current state of the art in forecasting a severe
storm. The Storm Prediction Center of
NOAA was able to provide approximately 4 hours advance warning of the
storm. Longer term predictions would
require additional data about the atmospheric instability that propelled the
Derecho from Iowa to the Washington
Metro area, as well as better real time modeling.
Shift of storm tracks towards the poles
Cold season storm tracks are
shifting northward over the last 50 years, with a decrease in the frequency of
storms in mid-latitude areas. The
northward shift is projected to continue, and strong cold season storms are
likely to become stronger and more frequent, with greater wind speeds and more
extreme wave heights.
The climate changes will have an
interesting effect on the so called “lake-effect”. Over the past 50 years, there is a record of
increased lake-effect snowfall near the Great Lakes. As the climate has warmed there is less ice
on the Great Lakes which has allowed greater
evaporation from the surface resulting in heavier snowstorms. Eventually, the temperatures are expected to
rise sufficiently that much of the precipitation will end up falling as rain,
reducing the snow totals.
While trending of individual elements such as temperatures
is useful, accurate predictions require consideration of the interaction
between the climate elements. For
example, there is mutual enhancement effect between droughts and heat
waves. Heat waves enhance soil drying,
and drier soil heats the air above more since no energy goes into evaporating
the soil moisture. Big data modeling can
show the results of this escalating cycle of warming on the future climate.
The New Normal
So it seems that all this abnormal weather we are seeing
will become the new normal. Forewarned
Analytics Solution Center, Washington, DC
[ii] Global Climate Change
Impacts in the United States,
Thomas R. Karl, Jerry M. Melillo, and Thomas C. Peterson, (eds.) Cambridge University Press, 2009
On July 4th, CERN scientists announced that they
observed a particle that strongly resembles the Higgs boson, a critical element
of the standard model of particle physics.
This particle is thought to be responsible for the characteristic of
mass, which gives objects weight when combined with gravity.
Detection of the Higgs Boson would not have been possible
without the last decade’s advances in processing big data. Joe Incandela, CMS Spokesman at CERN,
explained that if every collision that they scanned was a sand grain, these
sand grains would have filled up an Olympic sized pool over the last 2
years. They had to find the several
dozen or so grains of sand that exhibited characteristics consistent with the
In addition to developing the Large Hadron Collider, the
CERN teams also developed a data strategy to deal with the data from the
hundreds of millions of particle collisions occurring each second. The sensors record the raw data on billions
of events occurring in the proton collider. These readings are then reconstructed
to show the energy and directions of many particle traces. The data goes through 2 stages of filtering
to reduce the data on 40 million collisions/sec down to 10 million interesting
ones per second, and then to 100 or 200 collisions that are studied in
According to Rolf-Dieter Heuer, director general at CERN, “The
computing power and network is a very important part of the research.” Over
15 Petabytes (1 million Gigabytes) are stored each year. This is distributed through the Worldwide
Large Hadron Collider Computing Grid (WLCG) to each of 11 major Tier 1 centers
around the world, and from there to research centers and individual
scientists. In the U.S., the Open
Science Grid, supported by NSF and DOE, provides much of the compute and
storage power for this work. The
scientists use Monte Carlo simulations for
generating and propagating the physics interactions of the elementary particles
passing through the collider to determine which ones correspond to the
hypothesized behavior of the Higgs Boson.
What they found was a never seen before elementary particle
that seems to fit the behavior of the Higgs Boson and is very heavy –
approximately 133 proton masses. Further
data analysis is now needed to ascertain its spin, decay modes, and other
Think the amount of data generated by the Large Hadron
Collider is huge? The forthcoming Square
Kilometre Array radio telescope is expected to generate 100’s of Petabytes of
data per day. More on that in a future
In the 1980’s, John Naisbitt wrote, “We have for the first
time an economy based on a key resource [information] that is not only
renewable, but self-generating. Running
out of it is not a problem, but drowning in it is.[i]” Little did Naisbitt know how much information
we’d be creating 30 years later. By some
estimates we are generating over 1 zettabyte (1x1021) per year[ii]. How do you avoid drowning in all that data,
and gain insights? That is the realm of
Big Data Solutions.
Center recently ran a
seminar on Big Data. We started off
talking about the ‘big data conundrum.’
The volume of data is growing so rapidly, that the fraction of data that
an enterprise can analyze is decreasing.
Because of this gap, we’re getting ‘dumber’ about our organization and
job over time. This is driving the need
for improved analytics and platform technology that can help us to process this
large volume of data.
What do customers want to do with big data? Popular requests we’ve heard include: I/T log
analytics, RFID tracking and analytics, fraud detection and modeling, risk
modeling, 360o view of a
person/place/thing, call center record analysis, and fusion of multiple
unstructured objects (e.g., pictures, audio).
Since we now collect so much data, the possibilities are only limited by
your imagination –and our ability to extract insights from the data.
In order to process these large volumes of data, special
systems and applications are being deployed.
Many of these are based on the Apache Hadoop middleware which supports a
distributed file system and processing environment for scalability,
flexibility, and fault tolerance. IBM’s
big data platform includes offerings based on Apache’s Hadoop with enhancements
to improve workload optimization, security, and cluster hardening. The IBM offering (BigInsights) also comes
packaged with advanced analytical capabilities for data visualization, text
analysis, and support machine learning analytics. One interesting item was the announcement
that the enhancements would be packaged to allow them to work with other Hadoop
distributions, such as the Cloudera™ hadoop.
Another offering discussed in the seminar was the Stream computing
offering designed to efficiently process “data in motion,” such as stock ticker
streams and social media feeds.
One of the biggest challenges given the huge volume of
information is finding the right information.
Governments, Utilities, and financial companies have this problem in
particularly because of the huge volumes they deal with. A recent IBM acquisition, Vivisimo, has
developed a next-generation search engine to provide search across multiple big
data and traditional platforms. Vivisimo
provides a scalable search application framework that can perform a federated
search across many different data sources including the web, social media,
content stores, and more traditional structured database systems. One feature that may be particularly
appealing to government agencies and corporate environments is its ability to
map individual access permissions of each data item, authenticate users against
each target system and limit access to information a user would be entitled to
view if they were directly logged into the target system.
They offer a clever search tool that provides easy
navigation and discovery, using both structured metadata (faceted search) and
keywords that the program dynamically discovers based on analysis of
unstructured content. Vivisimo provides an agile development layer, to allow
users to quickly create applications and dashboards to discover, navigate and
The seminar also featured a customer case study of using big
data for cybersecurity mission operations. IP traffic is growing at 29% CAGR, and with it,
the cyber-threats they are facing. Unfortunately, the customer’s headcount
isn’t growing, so more automated ways are need to detect and respond to threats. For this application, timeliness is key –
dealing with threats in real-time. To
identify potential threats, they want to be able to compare current threat and
traffic data to norms from the recent past, and similar periods in the
past. Their solution utilizes the
Netezza data warehouse appliance for near real-term data and IBM BigInsights
for long term storage. The solution eliminates
as many mundane “data retrieval” tasks as possible for the analyst, and provided
the analysts with those datasets that had a high probability of being
“interesting.” In this way, the solution helps the analyst deal with the
extreme data volumes, and yet remains flexible to the changing threat
Do you have an opportunity to use massive amounts of data to
accomplish a business/mission objective that can’t be done when we were limited
to small volumes of data? Do you have an
innovative solution? We’d like to hear
your stories about big data.
For more on the Big Data seminar, see our ASC website under past events.
[i] Naisbitt, John,
Megatrends: Ten New Directions Transforming Our Lives, NY Warner Communications
Company, 1982, pages 23-24
[ii] IDC Digital Universe
Does your government agency monitor the social media for information relevant to your mission? Should it?
IBM's Analytics Solution Center recently held a seminar to explore
how agencies and companies can obtain value and insight using social
Pat Fiorenza discussed how agencies can develop an ROI Model - Return
on Influence Model - for social media. Agencies use social media
analytics to help inform their decision making by gathering
information/research, and learn what other agencies and citizens are
saying. Interesting examples from CDC and Govloop were provided.
Learn more here.
Ed Burek, IBM, talked about how savvy companies are now taping into
customer generated content, how government agencies could do the same to
learn how tax payers feel about government actions and messaging. He
gave examples of how regulatory agencies could received the unvarnished
comments from those impacted by regulations, as well as how they could
stay on top of "negative chatter." IBM has created a framework to
derive business insight from the vast amounts of social media that is
now being transmitted. Called Cognos Consumer Insight it provides real
time information on trends and sentiment.
Rick Lawrence, IBM Manager for Machine Learning at Watson Research
Center next talked about the leading edge of social media analytics. He
provided examples from the research portfolio on discovering Who are
the Key Influencers? , Identifying emerging topics of discussion, and
Mapping the billions of tweet to concepts that we really care about.
All of the presentations are available on the ASC website under Past Events (May 10, 2012)
Does your agency care about what its constituents are saying about it
on social media? Does your agency need to have real time intelligence
on events within its mission space? With 340 million Tweets per Day, 2
million blog posts, and 500 million facebook updates, how can you find
the important information? Social Media Analytics may be an idea
whose time has come.
Analytics Solution Center
P.S. The Center for the Business of Government issued a new report on Tweeting in Government. Pat provided a good overview here.
At the end of the Superbowl, people created 12,233 tweets per second. And it turns out that was less than half the
number of tweets created in Japan
on December 9th, when 25,088 tweets per second were recorded about
the Castle in the Sky anime movie.
Which, according to the Chinese, is nothing compared to the 32,312
messages per second sent on their twitter-like Sina Weibo system during the
beginning of the Chinese new year.
Within the government space, we’re no strangers to our own Big Data. Whether you’re in the DOD or NASA, the IRS or
SSA, you’ve got your own Big Data to deal with.
Last week, Forrester Research released a report that should help those in
government understand the Big Data Market.
It is called “ The Forrester Wave™: Enterprise Hadoop Solutions, Q1 2012,
(February 2, 2012)” report. IBM Technologies evaluated were IBM InfoSphere
BigInsights (IBM’s Hadoop-based offering), and IBM Netezza Analytics. In this
evaluation, IBM was placed in the Leaders category of the Wave and achieved the
highest possible score in both the Strategy and Market Presence segments. In
the third segment, Current Offering, IBM received the second highest score. You
the complete report here.
The report by analyst James
Kobielus states, “IBM has the deepest Hadoop platform and application portfolio.”
The IBM Analytics Solution
Center in Washington, DC
also focused on how to handle Big Data at its January 19th
seminar. The seminar covered various
aspects of Big Data including data-in-motion processing software, Hadoop
software, SONAS (scale out network attached storage), and the Netezza data
1. Big Data in Motion
back to the Tweeting, if you’re a government agency and you need to get
actionable insights into 10s of thousands of tweets per second which might be
about an unfolding crisis, how would you do it?
InfoSphere Streams is unlike anything else in the market in its ability
to ingest, analyze and act on data “in motion” – that is, data is processed and
analyzed at microsecond latencies.
2. Hadoop Big Data
is an open source codebase supported by the Apache software foundation. It is designed to process large volumes of
unstructured data. For example, if a government agency wanted to analyze months
of tweets or documents in non-real time, the Hadoop distributed file system
would be a good choice. The enterprise
class IBM Hadoop-based offering, BigInsights, is designed with system
management, security, and performance features that go beyond what is available
in the open source. It provides the
ability to analyze and extract information from a wide variety of data sources,
and promotes data exploration and discovery.
Attached Storage, or NAS, has become a very popular way to provide storage
within an organization. However NAS has
a number of limitations when dealing with
Big Data including the number of objects (files) it can support, support
for very large files, the i/o bandwidth
it can deliver to applications, and fragmented data management across multiple
systems. The IBM SONAS system is
designed to overcome these limitations and look like a very large virtual
system to the applications.
4. Data Warehouse Appliance
data warehouses when used for large volumes of structured data can be costly to
operate and maintain, and can be very slow when used for sophisticated
analysis. The Netezza appliance is a
dedicated device requiring no tuning or storage administration and with special
hardware chips to accelerate the performance of advanced analytics.
Want to learn more?
- More details on the topics can
be found at the ASC Website under
- On the educational front, we
provide free online training through BigDataUniversity.com. To
date, more than 13,000 students have registered for courses on Hadoop,
cloud computing and more.
We are working with a broad range of clients to help them define
their big data strategies. We look forward to working with you on your Big Data
The Forrester Wave™: Enterprise Hadoop Solutions, Q1 2012,
Forrester Research, Inc., February 2, 2012. The Forrester Wave is copyrighted
by Forrester Research, Inc. Forrester and Forrester Wave are trademarks of
Forrester Research, Inc. The Forrester Wave is a graphical representation of
Forrester's call on a market and is plotted using a detailed spreadsheet with
exposed scores, weightings, and comments. Forrester does not endorse any
vendor, product, or service depicted in the Forrester Wave. Information is
based on best available resources. Opinions reflect judgment at the time and
are subject to change.
On November 30, the Partnership for Public
Service (www.ourpublicservice.org) released
their new study, “From Data to Decisions: The Power of Analytics.” [i]
Keynoting the event was Shelley Metzenbaum, Associate Director for
Performance and Personnel Management, OMB.
She told the audience that Performance Management is a core pillar of
the Obama Administration and that Measurement and Analysis was the key tenet to
PM. She encouraged the audience to
identify analytics practices that work and spread the word to others. She exhorted the audience to not just collect
data but to use the data to pinpoint problems – “Ask Why, Why, Why” with
respect to performance problems.
The report studied 7 programs[ii] in 8 federal
agencies to understand how they use analytics and how it helped them achieve
better program results. The study
provides clear examples of how data is being used to understand problems and
improve mission performance. It
documents how CMS is using data to answer the question why isn’t health care
quality better and how can we direct scare resources to improve it? In a similar fashion, VA and HUD are using
data to figure out how to reduce homelessness of Veterans including identifying
bottlenecks that are keeping their voucher program from being more
successful. David Zlowe, Performance
Improvement Officer at VA, emphasized in the study that the power of VA’s
analytics approach isn’t in the numbers but in the discussion that are sparked….having
leadership engage in an appreciative conversation guided by hard data.” The 4th program that the
Partnership reported on in some detail is the FAA’s Safety Management
System. This program helps to identify
risks and to understand what contributes to all levels of hazards.
The Partnership event included a panel discussion with Michelle Snyder,
Deputy COO, CMS; Estelle Richman, COO
and Acting Deputy Secretary, HUD; and David Zlowe, PIO, VA. Ms.
Snyder’s advice to the audience was, “Take data, analyze it, tell the story to
the people so it relates and influences the decision makers.” Ms. Richman’s recommendation was to remember
that the analytics are but a method to accomplish the goal of creating an
outcome that can improve people’s lives.
And Mr. Zlowe summarized by saying, “We don’t lack data, we lack
We’d like to hear your experiences driving decisions based on data
in the government. If you'd like a copy of the report, write to me at: ASCdc@us.ibm.com
[i] The Study was a
collaboration between IBM’s Center for the Business of Government and the
Partnership for Public Service
[ii] HUD and VA Veternans
Affairs Supportive Housing (HUD-VASH) program; Safety Management System (SMS)
in the FAA; HHS CMS nursing homes and transplant programs; Coast Guards’
Business Intelligence system (CGBI); NHTSA “Click it or Ticket” campaign;
Navy’s Naval Aviation Enterprise; SSA’s use of mission analytics n customer
Do today’s MBAs need Analytical Skills? That was the question that a recent Symposium
tried to answer.
On October 21, George
Institute for Integrating Statistics in Decision Sciences (I2SDS)
and IBM’s Analytics
held a Symposium entitled: Analytics and the 21st Century MBA. The abstract provides a good description of
the thesis of the Symposium:
The 21st century belongs to those who can think and act analytically. No
longer is it good enough to make business decisions, no matter what the field,
based on little more than feelings or gut reactions to events. Consumer
products companies, insurance companies, banks, governments, and even sports
teams are turning to Analytics to improve their bottom line and assure their
survivability in this age of hyper-competition and increasingly severe
This Symposium… will demonstrate how Analytics is, a critical component
of 21st Business careers, whether the practitioner's primary responsibility is
in a functional area (Marketing, Operations, Finance, Strategy, International
Business, HR) or a vertical such as Health Care or Tourism.
The Symposium provided talks by leading users of Analytics in Marketing,
Retail, Finance, and the Public Sector.
More on the Symposium is at: http://business.gwu.edu/decisionsciences/i2sds/pdf/GWU%20ASCOutline.pdf
Do you agree with the thesis? Are you
seeing more need for employees with analytical skills? Do you think those with these skills are
having an easier time getting jobs?
I’d like to hear your thoughts.
Frank Stein, Director, Analytics
In these tough fiscal times, all agencies are going to be
focusing on doing more with less. How
does one get more done with less budget and staff? Consider turning to Analytics.
The consulting firm Nucleus Research has been looking at the
Return on Investment (ROI)
for various types of IT projects.
According to David O’Connell, Principal Analyst at Nucleus Research, “projects
involving analytics have some of the highest ROIs of any projects studied.”
Nucleus Research recently studied an analytics project IBM performed at DC
Water, the local water authority for Washington,
DC. In 2008, IBM began a first of a kind project
using advanced analytics to create a smarter water system that analyzes data on
valves, storm drains, service vehicles, truck routes and more to optimize its
infrastructure. With some pipes and other assets that date to the Civil War,
maintaining high levels of service while replacing older infrastructure is an
The project has resulted in the following benefits from a combination of IBM
Asset Management and Analytics technology and services:
Field Services trucks can be automatically
routed to optimize work management. This results in more work orders being
completed each week, as well as up to 20 percent reduction of fuel costs
related to fewer truck rolls and reduced "windshield" time.
Revenue loss from defective or
degrading water meters allowed recapture of $3.8 M because the analytics behind
the advanced metering infrastructure delivers more timely identification and
replacement of those meters. Revenue was
also recaptured because DC Water can now identify and bill locations where
there is unmetered water usage.
DC Water has been able to identify
assets most critically in need of repair using predictive analytics, so aging
infrastructure replacement programs can be more accurately scheduled,
preventing costly incidents that reduce service quality, such as outages and
water main breaks. This reduces both
maintenance labor costs and call center
costs associated with emergency incidents.
Nucleus Research reported in its case
study that the DC Water project resulted in $19.677 M of benefits over 3
years with a cost of $883 K, giving an ROI of 629%.
In 2010, Nucleus Research studied a number of other public
sector analytics projects. The results
from these projects are shown in the chart below. On average, the analytics projects have
resulted in an ROI of almost 600%! This
means that over 3 years, the projects have returned benefits 6 times the
original cost of the projects. The
payback period has been less than a year in all cases. This is important to government agencies because
it means you can see cost savings in the same fiscal year that you invest in an
According to David O’Connell, Principal Analyst at Nucleus
Research, “When government entities adopt
analytics, returns are high for two reasons.
First, waste such as leaky water mains, defective meters, or benefits
overpayments can be identified and eliminated.
Second, by making information more readily available, employees spend
less time looking around for information and more time getting their jobs done.” O’Connell went on to say, “Another improvement is better use of
workers’ time. The more an organization
knows about the public it serves, their needs, and the means of delivering
service, the smarter managers’ decisions are when they hand out workers’
Has your agency implemented any analytics projects? What’s been your experience?
Don't feel comfortable sharing
publicly? I'd be happy to hear your thoughts directly as well (firstname.lastname@example.org).
(net savings year 1 + net savings year 2 + net savings year 3)/3 * 100
Watson is the only computer on the planet that can answer a Jeopardy!
question in less than three seconds - fast enough to be competitive with the
world’s best human players.
of you that missed the match click here
to see a video clip from the match.)
But can a Watson-like computer help the government?
Watson was optimized to tackle a specific challenge:
competing against the world’s best Jeopardy! contestants. It does this by sifting through large amounts of unstructured information to find potential answers and assigning a confidence measure to each potential answer. When it has high confidence in an answer, it will buzz in and offer the answer. Beyond Jeopardy!,
IBM is working to deploy this technology
to businesses and governments dealing with the information overload
problem. At work, few of us are like
Ken Jennings, able to instantly answer almost every question thrown at us - -
with an 80-90% success rate. There is
simply too much information and more information is coming in all the
time. Whether we’re in finance, HR, IT,
or another area, our success at work depends upon dealing with huge volumes of
information, sifting through it to find
the “good information”, and then using the information to make decisions to do our
job. Technology like that used in Watson can provide for our consideration potential answers as well as the "evidence" it used to come up with potential answers.
In discussions recently with some of our military colleagues,
they came up with numerous ideas for deploying Watson-like technology. They cited the problem of “request overload” - - dealing with all the
requests for Predator and similar UAV missions.
How could they deploy their limited resources to best effect? Another person mentioned the problem of
sifting through all the intelligence information – most of it in the form of
unstructured information formats such as video and text – to find the relevant
information to a mission they were planning.
Another discussed the problem of monitoring their “situational
awareness” and how hard it was to keep track of all the data coming in. “Could Watson help monitor our security
posture and alert us to potential threats?” asked another.
Are you dealing with massive amounts of information? How could a Watson-like system assist you at
work? Do you want to recruit Watson to
work for your agency? We want to hear
your thoughts either in this blog or directly.
Write to me at email@example.com.
We’re hosting 2 free Watson Overview Briefings
on July 26 and 27. More
information at our website: www.ibm.com/ascdc
Frank Stein, Director, Analytics Solution Center
Sam Palmisano, Chairman and
CEO of IBM got together with Michael Dell, Chairman and CEO of Dell to release
an op-ed piece last week that the government can save $1 Trillion through the
use of IT.
for the statement.
Jeffrey Zients, Chief Performance Officer,
penned a blog
shortly thereafter titled, “Seeing Eye to Eye with the Tech CEO Council.”
Many of the
examples cited in the Palmisano/Dell statement relate to the use of analytics:
- Consolidating the government’s myriad supply
chains is likely to save $500 billion.
- Applying advanced analytics to reduce fraud
and error in federal grants, food stamps, Medicare payments, tax refunds
and other programs could save $200 billion.
- Using predictive
technology, New York
State is validating
tax refund requests and saving $889
million by catching phony refunds.
- Identifying suspicious Medicare activity using
analytics has shown North Carolina
how to save $25 million in just three months.
In addition to
helping to uncover fraud, waste, and abuse, I’d like to suggest 3 other ways analytics
can help the government to save money.
- Streamlining Processes: Analytics can help streamline and
optimize programs, reducing the costs of implementation while improving
service to citizens. For example,
IBM worked with Social Security to streamline their processing of
disability claims so that the majority of claims can be expedited with
little risk of allowing through unacceptable claims.
- Managing Performance: Performance management solutions can
help the management and staff of agencies to know their up-to-date
performance, and quickly spot and trouble-shoot performance issues before
they become major problems. Performance management can also help identify
successful approaches that can be replicated throughout and across
- Better decision-making: Analytics can help
agencies decide which programs to fund or the most effective
approach to take for a particular program.
By using modeling, simulation, and other data-driven approaches,
agency staff can make decisions that both save the tax payers’ money and
deliver the best results. For
example, by modeling and optimizing the US Postal Service transportation
network, USPS is able to increase utilization of assets and save hundreds
of millions of dollars.
I’d like to hear
your ideas for how agencies can save money through employing analytics. Write to me at firstname.lastname@example.org.
See our website for
further information on using analytics in government: www.ibm.com/ASCdc
Director, Analytics Solution Center
Apparently, pretty good, according to Nucleus Research. They recently completed 2 ROI Case Studies of 2 government analytics projects. Both showed impressive results:
- Alameda Country Social Service Agency's Social Services Integrated Reporting System (SSIRS) had an ROI of 631% and a payback of 2 months
- Memphis Police Department's Blue CRUSH (Criminal Reduction Utilizing Statistical History) had an ROI of 863% and a payback of 2.7 months
The ROI calculations may even be conservative as Nucleus Research appears to assume that the agency and department will pay taxes on the annual benefits from the solutions.
The SSIRS system helped Alameda County reduce overpayments to non-compliant citizens, improve their win rates when claimants appealed discontinuation of benefits, and improved caseworker productivity. The system is essentially a Business Intelligence solution giving the caseworkers access to information about their clients, with dashboard and drill down capabilities. It also provides the caseworkers and managers with immediate information on "how am I doing?". Providing caseworkers with information on their clients' work participation rate and other performance metrics was key to improving the performance of the social service agency. The solution combined Cognos Business Intelligence, Infosphere Identity Insight, and an Infosphere warehouse to hold all the data. Identity Insight helps the caseworkers track the relationships between the various clients (e.g., parent/child) that may impact services offered. Here is a video where Don Edwards, Assistant Agency Director, talks about the solution: YouTube Video
The Blue CRUSH solution helped the Memphis Police Department (MPD) to identify crime "hot spots" and then target these areas for increased attention. As a result, MPD has reduced violent crime without additional staffing. The solution uses IBM SPSS Predictive Analytics software to analyze crime data pertaining to type of criminal offense, time of day, day of week, location, and the weather. The solution was developed with the assistance of the University of Memphis Department of Criminology and Criminal Justice.
Memphis Police Department received a National award from Nucleus
Research for this solution. They were one of only
ten companies and governmental agencies to receive the Nucleus
Research ROI award. Out of 350 technology projects that were
submitted, the Memphis Police Department was one of only two
governmental agencies to receive an award. The other governmental
agency was the US State Department.
If you'd like more information on these two case studies please contact me at email@example.com.
More information about Analytics, including our Fall Analytics Seminar Series,
can be found at www.ibm.com/ASCdc
Director, Analytics Solution Center
on the weather and climate occurring around the world in
Because weather fascinates many
of us and is experienced by all of us, the report provides good examples of how
data can be analyzed, reported, and visualized.
The first observation I'd make is that the report focuses on
unusual or anomalous events. It tries to
put them in historical context. For
example, I learned that the U.S.
had the wettest October since records were collected 115 years ago. And Toronto
had a snow-free November for the first time in recorded history. In all data analysis - weather data,
financial data, or performance data – it is important to pull out the
significant events from the rest of the data, or the “noise” as we say. NOAA does this by comparing the past year’s
data with their historical data to find out where the year stood in comparison
to all the other years. Similar analysis
can be done by other agencies whether the metric is road-miles constructed or the percent of students receiving student aid that graduated. What is key is comparing the results in light of the historical data and trying to gain insights on what the trend is and what it means.
Because 2009 was the end of the decade, they have also
compiled some data at the decade level rather than at the year level. While the 2009 average global temperature was
the fifth warmest year on record, the 2000-2009 decade was the warmest on
record for the globe. And the decade
before, 1990 – 1999, was the warmest on record at that time. The use of multi-year averages is a good
example of smoothing that can be done to help ferret out significant
information and remove the year-to-year fluctuation in large collections of
time series data. The graph showing the
decade data makes the trend very obvious (Source NOAA report, chapter 2).
Many of the charts in the report show the yearly results as
a delta from the long term average, e.g. last year’s average surface
temperature was .5ºC above the 1961-1990 average using NASA/GISS data. By graphing
the time series data against the long term average, the anomalies standout. Other
charts show the actual values and it is possible to discern trends in the data.
For example, the lower tropospheric temperatures are increasing by
approximately .15ºC per decade. One can
use this information to predict the climate for future decades, which could
have value for policy purposes.
The report also highlights the very strong monthly and seasonal variability in the U.S. surface temperatures in 2009 that would be obscured if one just looked at yearly averages. Another analytical technique - Modeling - can be used to help analyze why the "why" behind the data. Why did 2009 show such strong variability? The report indicates that in 2009 the global climate switched from the La Nina conditions that dominated 2008 to El Nino sea surface temperature (SST) conditions in the tropical Pacific ocean. Was this the cause? NOAA global climate models were subjected to the Pacific SST observed data and the results are show below. While not all of the variability appears to be explained by the model, the warm first quarter over the Great Plains and cold summer seems mostly consistent with the impact of La Nina during the winter and El Nino during the summer.
The State of the Climate report shows good examples of many
data analysis techniques including historical analysis, near-real time
reporting, reanalysis of past data using newer, improved techniques, averaging
of multiple datasets to improve reliability, and drill down capabilities from
decades, to years, to seasons, to months,
and from global to regional to country and state. They also use
interesting visualization techniques.
Those interested in data analysis, as well as weather,
should download this report from the NOAA Website. (Arndt, D.S.,M.O. Baringer, and M.R. Johnson, Eds., 2010: State of the Climate in 2009, Bull. Amer. Meteor. Soc.). Note: While NOAA does use IBM Technology in its Research, the report does not state which technology is used in the reported climate studies and I don't intend to imply any relationship between this report and IBM.)
Those interested in further information on Analytics
including our fall schedule of events, please visit the Analytics Solution
Center website at www.ibm.com/ASCdc.
If your agency uses analytics in interesting and novel ways,
I'd like to hear from you. Please write to me at ASCdc@us.ibm.com.
Frank Stein, Director
The news last week was all about the weak job market.
Fed Chairman Ben Bernanke characterized the
job market as showing “continuing weakness.”
Well, guess what?
The job market
for those with Analytics skills is very hot.
Monster has over 1000 job listings for Business Analytics jobs.
Here at IBM, we have over 100 openings for Business
Analytics and Optimization jobs. Some of
these are associated with our Public Sector Practice, consulting to Federal,
State, and Local Governments or developing data-intensive, analytics solutions
to help them perform their mission.
Why are there so many jobs in this field? Businesses and governments today must figure
out how to do more with less.
Organizations can analyze data coming from their business processes to
develop new approaches to streamlining or even optimizing their business. In the past, many decisions involved in
running an organization were based on “gut instinct.” Today, it is not longer defensible to make
decisions in this way when it is possible to make “fact-based” decisions using
hard data. Data stored in a Business
Intelligence system can be used by every level of an organization to help staff
understand their business better, detect problems, and develop solutions that
will allow them to accomplish their mission better, cheaper and faster. Sophisticated analysis can be done on the
data to predict what will happen if the current trends continue, determine how
to achieve the best outcome, and study the impact of external uncertainties
such as the economy or the weather.
According to the Bureau of Labor Statistics in their
2010-2011 Occupational Outlook Handbook,
the employment of operations
research analysts is expected to grow 22 percent over the 2008-18 period. While not all analytics jobs require an
operations research degree, this gives a good indication of the long term
trend. We know that technology is continuing
to improve both in terms of raw compute power and in the design of efficient
algorithms to analyze and optimize solutions.
This increasing capability will drive the demand to add “smarts” to many
more systems and processes, and will drive the need for analysts who can apply
the technology. So analytics isn’t just
a good short term career choice, but a good target for long-term career
To do these jobs, though, requires in-depth skills and
knowledge. Skills in operations research
(OR) techniques, data mining, optimization, decision theory, and data analysis
are needed, along with some background in IT systems. The ideal candidate will also have some
domain knowledge about government or business functional areas since it is very
hard to apply the mathematical techniques in abstraction.
How to Find Analytics Jobs
Most Analytics jobs aren’t listed under “analytics” and many
won’t even come up under that keyword.
Use search terms like ‘business intelligence,” “performance management,”
“optimization,” and “operations
research.” If you have experience with
actual analytics software such as Cognos, SPSS, Intelligent Miner, or ILOG,
both Monster and IBM’s website return hits on those keywords.
Want to learn more about jobs at IBM in Business Analytics?
Go to www.ibm.com/employment
and click on the “Search for Jobs at IBM” link
You may also write me at ASCdc@us.ibm.com
Analytics Solutions Center of Washingtion, D.C. Director