This week I am in Moscow, Russia for today's "Edge Comes to You" event. Although we had over 20 countries represented at the Edge2012 conference in Orlando, Florida earlier this month, IBM realizes that not everyone can travel to the United States. So, IBM has created the "Edge Comes to You" events where a condensed subset of the agenda is presented. Over the next four months, these events are planned in about two dozen other countries.
This is my first time in Russia, and the weather was very nice. With over 11 million people, Moscow is the 6th largest city in the world, and boasts having the largest community of billionaires. With this trip, I have now been to all five of the so-called BRICK countries (Brazil, Russia, India, China and Korea) in the past five years!
The venue was the [Info Space Transtvo Conference Center] not far from the Kremlin. While Barack Obama was making friends with Vladimir Putin this week at the G2012 Summit in Mexico, I was making friends with the lovely ladies at the check-in counter.
If it looks like some of the letters are backwards, that is not an illusion. The Russian language uses the [Cyrillic alphabet]. The backwards N ("И"), backwards R ("Я"), the number 3 ("З), and what looks like the big blue staple logo from Netapp ("П"), are actually all characters in this alphabet.
Having spent eight years in a fraternity during college, I found these not much different from the Greek alphabet. Once you learn how to pronounce each of the 33 characters, you can get by quite nicely in Moscow. I successfully navigated my way through Moscow's famous subway system, and ordered food on restaurant menus.
The conference coordinators were Tatiana Eltekova (left) and Natalia Grebenshchikova (right). Business is booming in Russia, and IBM just opened ten new branch offices throughout the country this month. So these two ladies in the marketing department have been quite busy lately.
I especially liked all the attention to detail. For example, the signage was crisp and clean, and the graphics all matched the Powerpoint charts of each presentation.
Moscow is close to the North pole, similar in latitude as Juneau, Alaska; Edinburgh, Scottland; Copenhagen, Denmark; and Stockholm, Sweden.
As a result, it is daylight for nearly 18 hours a day. The first part of the day, from 8:00am to 4:30pm, was "Technical Edge", a condensed version of the 4.5 day event in Orlando, Florida. I gave three of the five keynote presentations:
- Game Change on a Smarter Planet: A New Era in IT, discussing Smarter Computing and Expert-Integrated systems, based on what Rod Adkins presented in Orlando.
- A New Approach to Storage, explaining IBM Smarter Storage for Smarter Computing, IBM's new approach to the way storage is designed and deployed for our clients
- IBM Watson: How it Works and What it Means for Society Beyond Winning Jeopardy! explaining how IBM Watson technologies are being used in Healthcare and Financial Services, based on what I presented in Orlando.
(Note: I do not speak Russian fluently enough to give a technical presentation, so I did then entire presentation in English, and had real-time translators convert to Russian for me. The audience wore headphones. However, I was able to sprinkly a few Russian phrases, such as "доброе утро", "Я не понимаю по-русский" and "спасибо".)
After the keynote sessions, I was interviewed by a journalist for [Storage News] magazine. The questions covered a variety of topics, from the implications of [Big Data analytics] to the future of storage devices that employ [Phase Change Memory]. I look forward to reading the article when it gets published!
The afternoon had break-out sessions in three separate rooms. Each room hosted seven topics, giving the attendees plenty to choose from for each time slot. I presented one of these break-out sessions, Big Data Cloud Storage Technology Comparison. The title was already printed in all the agendas, so we went with it, but I would have rather called it "Big Data Storage Options". In this session, I explained Hadoop, InfoSphere BigInsights, internal and external storage options.
I spent some time comparing Hadoop File System (HDFS) with IBM's own General Parallel File System (GPFS) which now offers Hadoop interfaces in a Shared-Nothing Cluster (SNC) configuration. IBM GPFS is about twice as fast as HDFS for typical workloads.
At the end of the Technical Edge event, there was a prize draw. Business cards were drawn at random, and three lucky attendees won a complete four-volume set of my book series "Inside System Storage"! Sadly, these got held up in customs, so we provided a "certificate" to redeem them for the books when they arrive to the IBM office.
The second part of the day, from 5:00pm to 8pm, was "Executive Edge", a condensed version of the 2 day event in Orlando, designed for CIOs and IT leaders. Having this event in the evening allowed busy executives to come over after they spend the day in the office. I presented IBM Storage Strategy in the Smarter Computing Era, similar to my presentation in Orlando.
Both events were well-attended. Despite fighting jet lag across 11 time zones, I managed to hang in there for the entire day. I got great feedback and comments from the attendees. I look forward to hearing how the other "Edge Comes to You" events fare in the other countries. I would like to thank Tatiana and Natalia for their excellent work organizing and running this event!
technorati tags: IBM, Moscow, Russia, Edge, ECTY, Cyrillic, Tatiana Eltekova, Natalia Grebenshchikova, Smarter Storage, Smarter Computing, Smarter Planet, Big Data, Cloud, IBM Watson, Jeopardy, Hadoop, HDFS, InfoSphere, BigInsights, GPFS, GPFS-SNC
Last February, IBM introduced Watson on the Jeopardy! game show. These three shows were re-aired this week in the United States. I wrote a series of blog posts back then:
This last one on how to build your own Watson, Jr. has gotten over 69,000 hits! While several people told me they plan to build their own, I have not heard back from anyone yet, so perhaps it is taking longer than expected.
IBM and Wellpoint announced this week that it will be [putting Watson to work] in healthcare. [Wellpoint] is one of the largest health benefits company in the United States, with over 70 million people served through its affiliate plans and its various subsidiaries. I am one of the development lab advocates for Wellpoint, and have been proud to work with the account team to help Wellpoint achieve their goals.
This marks the first commercial deployment of IBM Watson. This is a joint effort. IBM will develop the base IBM Watson for healthcare platform, and Wellpoint will then develop healthcare-specific solutions to run on this platform. Watson's ability to analyze the meaning and context of human language, and quickly process vast amounts of information to suggest options targeted to a patient's circumstances, can assist decision makers, such as physicians and nurses, in identifying the most likely diagnosis and treatment options for their patients.
Is this going to put doctors out of business? No. Physicians find it challenging to read and understand hundreds or thousands of pages of text, and put this into their practice. IBM Watson, on the other hand, can scan through hundred of millions of pages in just a few seconds to help answer a question or provide recommendations. Together, doctors armed with access to IBM Watson will be able to improve the quality and effectiveness of medical care.
From an insurance point of view, improving the quality of care will help reduce medical mistakes and malpractice lawsuits. This is a win-win for everyone except ambulance-chasing lawyers!
technorati tags: IBM, Wellpoint, Healthcare, Watson, Jeopardy
The IBM Challenge was a big success. One of the contestants, Ken Jennings, [welcomes our new computer overlords]. Congratulations are in order to the IBM Research team who pulled off this Herculean effort!
Some folks have poked fun at some of the odd responses and wager amounts from the IBM Watson computer during the three-day tournament. Others were surprised as I was that the impressive feat was done with less than 1TB of stored data. Here is what John Webster wrote in CNET yesterday, in hist article [What IBM's Watson says to storage systems developers]:
"All well and good. But here's what I find most interesting as a result of what IBM has done in response to the Grand Challenge that motivated Watson's creators. We know, from Tony Pearson's blog, that the foundation of Watson's data storage system is a modified IBM SONAS cluster with a total of 21.6TB of raw capacity. But Pearson also reveals another very significant, and to me, surprising data point: "When Watson is booted up, the 15TB of total RAM are loaded up, and thereafter the DeepQA processing is all done from memory. According to IBM Research, the actual size of the data (analyzed and indexed text, knowledge bases, etc.) used for candidate answer generation and evidence evaluation is under 1 Terabyte."
What Pearson just said is that the data set Watson actually uses to reach his push-the-button decision would fit on a 1TB drive. So much for big data?"
To better appreciate how difficult the challenge was, and how a small amount of data can answer a billion different questions, I thought I would cover Business Intelligence, Data Retrieval and Text Mining concepts.
Let's start with Business Intelligence.
[Seth Grimes] pointed me to this quote from [A Business Intelligence System], written by Hans Peter Luhn back in October 1958 IBM Journal.
"In this paper, business is a collection of activities carried
on for whatever purpose, be it science, technology,
commerce, industry, law, government, defense, et cetera.
The communication facility serving the conduct of a business
(in the broad sense) may be referred to as an intelligence
system. The notion of intelligence is also defined
here, in a more general sense, as the ability to apprehend
the interrelationships of presented facts in such a way as
to guide action towards a desired goal."
Ideally, when you need "Business Intelligence" to help you make a better decision, you perform data retrieval from a structured database for the specific information you are looking for. In other cases, you might be looking for insight, patterns or trends. In that case, you go "data mining" against your structured databases.
Here's a simple example. John runs a fruit stand. One day, he kept track of how many apples and oranges were bought by men and women. How many questions can we ask against this small set of data? Let's count them:
- How many apples were sold to men?
- How many apples were sold to women?
- How many oranges were sold to men?
- How many oranges were sold to women?
But wait! For each row and column, we can combine them into totals.
- How many apples were sold in total?
- How many oranges were sold in total?
- How many fruit in total were sold to men?
- How many fruit in total were sold to women?
- How many fruit in total were sold?
|Total||63||50%||63||50%||126||But wait, there's more! Each row and column can be evaluated for relative percentages, as well as percentages of each cell compared to the total. You could make five relevant pie-charts from this data. This results in 16 more questions, such as:
- Of the fruit purchased by men, what percentage for apples?
- Of all the apples purchased, what percentage by women?
And that's not including more ethereal questions, such as:
- Are there gender-specific preferences for different types of fruit?
- What type of fruit do men prefer?
This is just for a small set, two market segments (by gender) and two products (apples and oranges). However, if you have many market segments (perhaps by age group, zip code, etc.) and many products, the number of queries that can be supported is huge. For small sets of data, you can easily do this with a spreadsheet program like IBM Lotus Symphony or Microsoft Excel.
(Photo courtesy of [OLAP, Cubes and Multidimensional Analysis] by Andrew Fryer.)
But why limit yourself to two dimensions? The above example was just for one day's worth of activity, if John captures this data for every day for historical and seasonal trending, it can be represented as a three-dimensional cube. The number of queries becomes astronomical. This is the basis for Online Analytical Processing (OLAP), and three-dimensional tables are often referred to as [OLAP cubes].
Back in 1970, IBM invented the Structured Query Language [SQL], and today, nearly all modern relational databases support this, including IBM DB2, Informix, Microsoft SQL Server, and Oracle DB. SQL poses two challenges. First, you had to structure the data in advance to the way you expect to perform your ad-hoc queries. Deciding the groups and categories in advance can limit the way information is recorded and captured.
Second, you had to be skilled at SQL to phrase your queries correctly to retrieve the data you are after. What ended up happening was that skilled SQL programmers would develop "canned reports" with fixed SQL parameters, so that less-skilled business decision makers could base their decisions from these reports.
IBM has fully integrated stacks to help process structured data, combining servers, storage, and advanced analytics software into a complete appliance. IBM offers the [Smart Analytics System] for robust, customized deployments, and recently acquired [Netezza] for pre-configured, and more rapid deployments.
However, the bigger problem is that more than 80 percent of information is not structured!
Semi-structured data like email provides some searchable fields like From and Subject. The rest of the information is unstructured, such as text files, photographs, video and audio. To look for specific information in unstructured sources can be like looking for a needle in a haystack, and trying to get insight, patterns or trends involves text mining.
IBM is a leader in Business Analytics and has made great progress in dealing with unstructured data. This includes [IBM OmniFind Enterprise Edition], [IBM e-Discovery Manager] and [IBM Cognos Business Intelligence].
This, in effect, is what IBM Watson was able to perform so well this week. Finding the needle in the haystacks of unstructured data from 200 million pages of text stored in its system, combined with the ability to apprehend the interrelationships of meaning and subtle nuance, resulted in an impressive technology demonstration. Certainly, this new technology will be powerful for a variety of use cases across a broad set of industries!
To learn more, read the Arizona Daily Star's article [After 'Jeopardy!' win, IBM program steps out].
technorati tags: IBM, Watson, Jeopardy, Challenge, John Webster, CNET, BI, data mining, Text Mining, OLAP, Arizona, Daily Star
IBM Challenge: Tucson Watch Event for Jeopardy!
The Tucson Executive Briefing Center hosted 20 dignitaries from local companies and academia.
This is a historic competition, an exhibition match pitting a computer against the top two celebrated Jeopardy champions:
- Brad Rutter, won $3.2 million USD on Jeopardy!, winning 5 days on the show, and then three later tournamets.
- Ken Jennings, winning $2.5 million in a 74-day winning streak on Jeopardy!
One of the members of the audience had never seen an episode of Jeopardy! in his life.
(Note: there are NO SPOILERS in this blog post. If you have not yet watched the show, you are safe to continue reading the rest of this post. I will not
disclose the correct responses to any of the clues nor how well each contestant scored.)
- Opening Remarks
Calline Sanchez, IBM Director, Systems Storage Development for Data Protection and Retention, kicked off today's ceremonies.
The IBM Watson computer, named after IBM founder Thomas J. Watson, has been developed over the past 4 years by a team of IBM scientists who set out to accomplish a grand challenge - build a computing system that rivals a human's ability to answer questions posed in natural language with speed, accuracy and confidence. IBM Research labs in the United States, Japan, China and Israel [collaborated with Artificial Intelligence (AI) experts at eight universities], including Massachusetts Institute of Technology (MIT), University of Texas (UT) at Austin, University of Southern California (USC), Rensselaer Polytechnic Institute (RPI), University at Albany (UAlbany), University of Trento (Italy), University of Massachusetts Amherst, and Carnegie Mellon University.
(Disclaimer: I attended the University of Texas at Austin. My father attended Carnegie Mellon University.)
Last week, NOVA on PBS had a special episode on the making of IBM Watson, you can [watch it online] on their website. Delaney Turner, IBM Social Media Communications Manager for Business Analytics Software, has posted [his observations of Nova].
Since IBM Watson is the size of 10 refrigerators and weighs over 14,000 pounds, it was easier to design the Jeopardy! set at the TJ Watson Research lab in Yorktown Heights, NY, than to ship it over to California where the show is normally recorded. Two of the visual designers that worked on this set, as well as on the visual appearance of Watson, live in Tucson and were part of our audience today.
The IBM Challenge consists of a two-game tournament, where the scores of both games will be added to determine winner rankings. The producers of Jeopardy! will give $1 million dollars USD to first place, $300,000 to second place, and $200,000 to third place. Regardless of outcome, [IBM will donate all of its winings to charity]. The two human contestants plan to donate half of their earnings to their favorite charities as well.
- Jeopardy! The IBM Challenge
Alex Trebek introduces IBM Watson, explaining that it can neither hear nor see. It will receive all information electronically. Categories and clues will be sent as text files via TCP/IP over Ethernet at the same time the two human contestants see them so that all have the same time to think about the right answer.
Watson has two rows of five racks, back to back. This was done so that cold air could rise up from holes in the tile floors around the unit, and all the hot air would be forced into the center and up to the ceiling return. This technique is known as "hot aisle/cold aisle" design. Alex Trebek opens one of the rack doors to show a series of 4U-high IBM Power 750 servers.
The avatar is a representation of Watson, as the machine itself is too big to fit behind the podium. The avatar is IBM's "Smarter Planet" logo with orbiting streaks and circles. It shows "Green" when it has high confidence, and orange when it gets an answer wrong. When busy thinking, the streaks and circles speed up, the closest we will see to "watching a computer sweat."
During the show, an "Answer panel" shows Watson's top three candidate responses, with confidence level compared to its current "buzz threshold".
Watson knows what it knows, and knows what it doesn't know. Here is an [Interactive Watson Game] on New York Times website to give you an idea of how the answer panel works. I was impressed with how close all three candidate answers were. In a question about Olympic swimmers, all three candidates are Olympic swimmers. In a question about the novel "Les Miserables", all three candidates were characters of that novel.
- Closing Remarks
Well, IBM Watson did well, but missed answered some questions incorrectly. This [parody Slate video] pokes fun at this. Here were some discussions we had after the show ended:
- IBM did not do well in categories that required [abductive reasoning]. For example, to identify two or three things that happened in different years, and then postulate that what they all have in common is a specific decade (such as the 1950s) is difficult.
- Watson does not hear the wrong answers from the two human contestants. For one question, Ken buzzes in first, guesses wrong, then Watson buzzes in with the same exact response. Alex Trebek rebukes Watson with "No, Ken just said that!" Brad would learn from their mistakes and guess correctly for the score.
- Watson is provided the correct answer after a contestant guesses it correctly, or if nobody does, when Alex provides the correct response. This is sent as a text message to Watson immediately, so that it can use this information to adjust its algorithms and machine-learning for future clues in that same category. This was evident in the "Answer panel" on the fourth and fifth attempts on the category of "Decades".
With this demonstration, IBM Research has advanced science by leaps and bounds for the Articial Intelligence community. IBM is a leader in Business Analytics, and this technology will find uses in a variety of industries. The average knowledge worker spends 30 percent of her time looking for information on corporate data repositories. By demonstrating a computer that can provide answers quickly, employees will be more productive, make stronger business decisions, and have greater insight.
Day 1 was only able to cover the first round of Game 1. This allowed more time to talk about the history and technology of IBM Watson. Tomorrow, the contestants will finish Game 1 and head into Game 2.
technorati tags: IBM, Challenge, Watson, Jeopardy, Calline Sanchez, Delaney Turner
Homework Assignment: My Results
We are only days away from the big IBM Challenge of Watson computer against two human contestants on the show Jeopardy!
I watched two episodes of Jeopardy! on my Tivo, pausing it to follow the [homework assignment] I suggested in my last post. Here are my own results and observations.
- Game 1
Episode  involved a web programmer, a customer service representative, and a bank teller.
- Round 1
Of the first six categories in Round 1, I guessed four of the six themes for each category. For the category "Diamonds are Forever", I wrote down "All answers are some kind of gem or mineral", but the reality was that all the answers were some physical characteristic of diamonds specifically. For the category "...Fame is not", I wrote down "All answers are TV or Movie celebrities". I was close, but actually it was famous celebrities, rock bands and pop culture of the 1980s. (The movie "Fame" came out in 1980).
In the round, there were 27 of the 30 answers given before they ran out of time. Of these, I was able to get 24 of 27 correct by searching the Internet. That is 88 percent correct. Here were the ones that eluded me:
- Answer related to a "multi-chambered mollusk". I could not find anything on the Internet definitively on this, so abstained from wager. The correct question was "What is Nautilus?".
- Answer was the Irish variant of "Kathryne". I found Kathleen as a variant, but did not investigate if it had Irish origins. The correct question was "What is Caitlin?"
- Answer was this Norse name for "ruler" whether you had red hair or not. I found "Roy" and "Rory" so guessed "What is Rory?" The correct question was "What is Eric?"
- Round 2
The second round, I guesed three of the six themese for the categories. For category "Musical Titles Letter Drop" I wrote down "All the answers are titles of musical songs" but it was actually "Musicals" as in the Broadway shows. For category "Place called Carson", I wrote down "All the answers are places" and was way off on that one, with answers that were people, places and names of corporations. And for "State University Alums", I wrote down "All the answers are college graduates", but instead they were all "State Universities" such as the University of Arizona.
In this second round, only 26 answers were posed. I got 80 percent correct with Internet searching. I missed three on the "Musical Titles", one in "Pope-pourri" and one State University (sorry SMU). The "Musical Titles Letter Drop category" was especially difficult, as for each title of a Musical, you had to remove a single letter out of it to form the correct response.
- For the answer "Good luck when you ask the singers "What I Did For Love"; they never tell the truth", you would need to take "Chorus Line" the musical, where the song "What I did for Love" appears, and ask "What is Chorus Lie?" Note that "line" changed to "lie" and the letter "n" was dropped out.
- For the answer "Embrace the atoms as Simba and company lose and gain electrons en masse in this production", you would need to recognize that Simba was the main character of "The Lion King" and change it to "What is The Ion King".
I think these play-on-words are the questions that would stump the IBM Watson computer.
- Round 3
In the final round, the category was "Ancient Quotes". I thought the answer would be a famous adage or quotation, but it was instead famous people who uttered those phrases. The answer was "He said, to leave this stream uncrossed will breed manifold distress for me; to cross it, for all mankind". I was able to determine the correct response readily from searching the Internet: The river was the Rubicon, the border of the Gaul region governed by an ambitious general. The correct response "Who was Julius Caesar?"
Total time for the entire exercise: 87 minutes.
- Game 2
The following night, episode  brought back Paul Wampler, the returning champion web programmer, against two new contestants: an actor, and high school principal.
- Round 1
Of the first six categories in Round 1, I guessed five of the six themes for each category. For the category "Nonce Words", I wrote all the answers would be nonsense words. I was close, the clues had words invented for a particular occasion, but the correct responses did not.
I was able to get 29 of 30 correct by searching the Internet. That is 96 percent correct. The one I missed was in the category "Nonce Words" and the answer was "In an arithmocracy, this portion of the population rules, not trigonometry teachers.." My response was "What is Math?" but the correct answer was "What are the majority?" It did not occur for me to even look up [Arithmocracy] as a legitimate word, but it is real.
- Round 2
The second round, I guesed five of the six themese for the categories. For category "Hawk" eyes, the "Hawk" was in quotation marks, so I wrote "All answers would start with the word Hawk or end with the word "eyes". I was close, the correct theme was that the word "hawk" would appear in the front, middle or end of the correct response.
In this second round, I got 28 of 30 correct. I got 93 percent correct with Internet searching. Ironically, it was the category "German Foods" that caught me off guard.
- For, the answer was "Pichelsteiner Fleisch, a favorite of Otto von Bismarck, is this one-pot concoction, made with beef & pork". I know that "fleisch" is a German word for meat, so I guessed "What is sausage?" but the correct response was "What is stew?" I should have paid more attention to the "one-pot concoction" part of the answer.
- For the answer was "Mimi Sheraton says German stuffed hard-boiled eggs are always made with a great deal of this creamy product". I didn't realize that "stuffed eggs" was German for "deviled eggs". Instead, I found Mimi Sheraton's "The German Cookbook" on Google Books, and jumped to the page for "Stuffed Eggs" The ingredients I read included whippedc cream, cognac, and worcestershire sauce. Taking the "creamiest" ingredient of these, I wrote down "What is whipped cream?" However, it turned out I was actually reading the ingredients for "Crabmeat Cocktail" that was coninuing from the previous page. I thought it was gross to put whipped cream with eggs, and should have known better. The correct response was "What is mayonnaise?"
- Round 3
In the final round, the category was "Political Parties". This could either be political organizations like Republicans and Democrats, or festivities like the Whitehouse Correspondents Dinner. The answer was "Only one U.S. president represented this party, and he said, I dread...a division of the republic into two great parties." So, we can figure out the answer refers to political organizations, but both Democrat and Republican are ruled out because each has had multiple presidents. So, looking at a [List of Political Parties of each US President], I found that there were four presidents in the Whig party, four in the Democrat-Republic party, but only one president in the Federalist party (John Adams), and one in the War Union party (Andrew Johnson). Looking at [famous quotes from John Adams] first, I found the quote, it matched, and so I wrote down "What is the Federalist party?". I got it right, as did two of the three contestants. Ironically, the one contestant who got it wrong, the returning champion web programmer, wagered a small amount, so he still had more money after the round and won the game overall.
Total time for the entire exercise: 75 minutes. I was able to do this faster as I skipped searching the internet for the responses I was confident on.
To find out when Jeopardy is playing in your town, consult the [Interactive Map].
technorati tags: IBM, Challenge, Watson, Jeopardy, homework