Comment (1) Visits (28931)
"When Watson is booted up, the 15TB of total RAM are loaded up, and thereafter the DeepQA processing is all done from memory. According to IBM Research, the actual size of the data (analyzed and indexed text, knowledge bases, etc.) used for candidate answer generation and evidence evaluation is under 1 Terabyte (TB). For performance reasons, various subsets of the data are replicated in RAM on different functional groups of cluster nodes. The entire system is self-contained, Watson is NOT going to the internet searching for answers."
I had several readers ask me to explain the significance of the "Terabyte". I'll work my way up.
For those of us in the IT industry, 1TB is small potatoes. I for one, was expecting it to be much bigger. But for everyone else, the equivalent of 200 million pages of text that IBM Watson has loaded inside is an incredibly large repository of information. I suspect IBM Watson probably contains the complete works of Shakespeare as well as other fiction writers, the IMDB database, all 3.5 million articles of Wikipedia, religious texts like the Bible and the Quran, famous documents like the Magna Carta and the US Constitution, and reference books like a Dictionary, a Thesaurus, and "Gray's Anatomy". And, of course, lots and lots of lists.
For those on Twitter, follow [@ibmwatson] these next three days during the challenge.
IBM Challenge: Tucson Watch Event for Jeopardy!
The Tucson Executive Briefing Center hosted 20 dignitaries from local companies and academia. This is a historic competition, an exhibition match pitting a computer against the top two celebrated Jeopardy champions:
One of the members of the audience had never seen an episode of Jeopardy! in his life.
(Note: there are NO SPOILERS in this blog post. If you have not yet watched the show, you are safe to continue reading the rest of this post. I will not disclose the correct responses to any of the clues nor how well each contestant scored.)
Day 1 was only able to cover the first round of Game 1. This allowed more time to talk about the history and technology of IBM Watson. Tomorrow, the contestants will finish Game 1 and head into Game 2.
Happy [Valentine's Day] everyone! Love is in the air! There was plenty of evidence of this everywhere I looked:
Sadly, only 70 percent of doctors in the United States use Electronic Medical Record [EMR] systems. My own Primary Care Physician has made the switch, and told me he how much he loves having ready access to the information he needs. EMR systems reduce costs, help manage risk, and improve healthcare outcomes. It is no surprise that the U.S. government has taken a [stick-and-carrot approach] to encourage doctors to use them.
Two years ago this week, [IBM Watson won the Grand Challenge] on the popular Jeopardy! game show. I wrote [a series of blog posts on IBM Watson]. To-date, there have been over 90,000 downloads for my now infamous step-by-step instructions on [How to build your own "Watson Jr." in your basement]!
A frequent topic at the Tucson Executive Briefing Center where I work is how to make the most use of IT for healthcare and life sciences. For much of 2011 and 2012, I was also one of the technical advocates assigned to Wellpoint Insurance, in support of their adoption of IBM Watson technology for healthcare.
Consider [Oncology], the branch of medicine focused on cancer. IBM has just released a new 8-minute YouTube video [IBM Watson Demo: Oncology Diagnosis and Treatment] that shows how IBM Watson is being put to use at [Memorial Sloan-Kettering], a world-class cancer treatment facility.
This is just one of the many [IBM Smarter Healthcare solutions] that is helping to build a smarter planet!
The IBM Challenge was a big success. One of the contestants, Ken Jennings, [welcomes our new computer overlords]. Congratulations are in order to the IBM Research team who pulled off this Herculean effort!
Some folks have poked fun at some of the odd responses and wager amounts from the IBM Watson computer during the three-day tournament. Others were surprised as I was that the impressive feat was done with less than 1TB of stored data. Here is what John Webster wrote in CNET yesterday, in hist article [What IBM's Watson says to storage systems developers]:
"All well and good. But here's what I find most interesting as a result of what IBM has done in response to the Grand Challenge that motivated Watson's creators. We know, from Tony Pearson's blog, that the foundation of Watson's data storage system is a modified IBM SONAS cluster with a total of 21.6TB of raw capacity. But Pearson also reveals another very significant, and to me, surprising data point: "When Watson is booted up, the 15TB of total RAM are loaded up, and thereafter the DeepQA processing is all done from memory. According to IBM Research, the actual size of the data (analyzed and indexed text, knowledge bases, etc.) used for candidate answer generation and evidence evaluation is under 1 Terabyte."
To better appreciate how difficult the challenge was, and how a small amount of data can answer a billion different questions, I thought I would cover Business Intelligence, Data Retrieval and Text Mining concepts.
"In this paper, business is a collection of activities carried on for whatever purpose, be it science, technology, commerce, industry, law, government, defense, et cetera. The communication facility serving the conduct of a business (in the broad sense) may be referred to as an intelligence system. The notion of intelligence is also defined here, in a more general sense, as the ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal."
Ideally, when you need "Business Intelligence" to help you make a better decision, you perform data retrieval from a structured database for the specific information you are looking for. In other cases, you might be looking for insight, patterns or trends. In that case, you go "data mining" against your structured databases.
And that's not including more ethereal questions, such as:
This is just for a small set, two market segments (by gender) and two products (apples and oranges). However, if you have many market segments (perhaps by age group, zip code, etc.) and many products, the number of queries that can be supported is huge. For small sets of data, you can easily do this with a spreadsheet program like IBM Lotus Symphony or Microsoft Excel.
Second, you had to be skilled at SQL to phrase your queries correctly to retrieve the data you are after. What ended up happening was that skilled SQL programmers would develop "canned reports" with fixed SQL parameters, so that less-skilled business decision makers could base their decisions from these reports.
IBM has fully integrated stacks to help process structured data, combining servers, storage, and advanced analytics software into a complete appliance. IBM offers the [Smart Analytics System] for robust, customized deployments, and recently acquired [Netezza] for pre-configured, and more rapid deployments.
However, the bigger problem is that more than 80 percent of information is not structured! Semi-structured data like email provides some searchable fields like From and Subject. The rest of the information is unstructured, such as text files, photographs, video and audio. To look for specific information in unstructured sources can be like looking for a needle in a haystack, and trying to get insight, patterns or trends involves text mining.
IBM is a leader in Business Analytics and has made great progress in dealing with unstructured data. This includes [IBM OmniFind Enterprise Edition], [IBM e-Discovery Manager] and [IBM Cognos Business Intelligence].
This, in effect, is what IBM Watson was able to perform so well this week. Finding the needle in the haystacks of unstructured data from 200 million pages of text stored in its system, combined with the ability to apprehend the interrelationships of meaning and subtle nuance, resulted in an impressive technology demonstration. Certainly, this new technology will be powerful for a variety of use cases across a broad set of industries!
To learn more, read the Arizona Daily Star's article [After 'Jeopardy!' win, IBM program steps out].
Comments (2) Visits (21318)
Last week, I got the following comment from Bob Swann:
I am looking for the IBM VM Poster or a picture of the IBM VM "Catch the Wave"
Well, Bob, I made some phone calls. The company that published these posters no longer exists, butI found a coworker at the Poughkeepsie Briefing Center who still had the poster on his wall, and he was kind enough to take a picture of it for you.
Some may recognize this as a [mash-up] using as a base the famous Japanese 10-inch by 15-inch block print[The Great Wave off Kanagawa] byartist [Katsushika Hokusai]. I had this as my laptop'swallpaper screen image until last year when I was presenting in Kuala Lumpur, Malaysia. I was told that it reminded people about the horrible tsunami caused by the [Indian Ocean earthquake] back in 2004.I was actually scheduled to fly the last week of December 2004 to Jakarta, Indonesia, but at the last minute ourclient team changed plans. I would have been on route over the Pacific ocean when the tsunami hit, and probably stranded over there for weeks or months until the airports re-opened.
The Wave theme was in part to honor the IBM users group called World Alliance VSE VM and Linux (WAVV) which is havingtheir next meeting [April 18-22, 2008] in Chattanooga, Tennessee. I presentedat this conference back in 1996 in Green Bay, Wisconsin, as part of the IBM Linux for S/390 team. It started onthe Sunday that Wisconsin switched their clocks for [DaylightSaving Time], and the few of us from Arizona or other places that don't both with this, all showed up forbreakfast an hour early.
When I was in Australia last year, I was told the wave that sports fans do, by raising their hands in coor The "wave" represents a powerful metaphor, from z/VM operating system on System z mainframes to VMware and Xenon Intel-based processor machines, as the direction of virtualization that we are heading for future data centers.The Mexican wave represents a glimpse of what humans can accomplish with collaboration on a globalscale. It can also represent the tidal wave of data arising from nearly 60 percent annual growth instorage capacity. (I had to mention storage eventually, to avoid being completely off-topic on this post!) I hope this is the graphic you were looking for Bob. If anyone else has wave-themed posters they would like to contribute, please post a comment below. technorati tags: Bob Swann, IBM poster, z/VM, Japanese, Great Wave, Kanagawa, Katsushika Hokusai, Kuala Lumpur, Malaysia, Indian Ocean, Jakarta, Indonesia, WAVV, Mexican Wave, storage, capacity, growth, Linux,Melbourne, Australia, VMware, Xen
The "wave" represents a powerful metaphor, from z/VM operating system on System z mainframes to VMware and Xenon Intel-based processor machines, as the direction of virtualization that we are heading for future data centers.The Mexican wave represents a glimpse of what humans can accomplish with collaboration on a globalscale. It can also represent the tidal wave of data arising from nearly 60 percent annual growth instorage capacity. (I had to mention storage eventually, to avoid being completely off-topic on this post!)
I hope this is the graphic you were looking for Bob. If anyone else has wave-themed posters they would like to contribute, please post a comment below.
technorati tags: Bob Swann, IBM poster, z/VM, Japanese, Great Wave, Kanagawa, Katsushika Hokusai, Kuala Lumpur, Malaysia, Indian Ocean, Jakarta, Indonesia, WAVV, Mexican Wave, storage, capacity, growth, Linux,Melbourne, Australia, VMware, Xen[Read More]
In case you missed it, IBMunveiled a new digital video surveillance service yesterday. This "marks an important shift in the industry's approach to security, applying advanced analytics to video data and signaling the ability to converge physical and information technology (IT) security."
The IBM Smart Surveillance Solution is designed to provide the unique capability to carry out efficient data analysis of video sequences either in real time or from recordings. These recordings can be on disk or tape storage.
The problem with today's existing "analog" surveillance is that the analog cameras record onto traditional VHS tapes, and these are rotated through, re-written after a few hours or days. To review tapes often involves human intervention, and must be done before the VHS tapes are re-used. Many shoplifters, thieves, and other law-breakers take a chance that their actions will not be caught on tape, or that they will be long gone by the time the video is analyzed.
The IBM Smart Surveillance Solution can provide a number of advantages over traditional video solutions, including:
With real-time analytics capabilities, the new DVS service can open up a wide array of new applications that go far beyond the traditional security aspects of surveillance systems. Early adopter industries in this rapidly evolving market include retail, public sector and financial services. The retail industry estimates nearly $50 billion is lost annually to fraud, theft and administrative errors.
Once in digital format, video surveillance can be sent further, processed quicker, and stored for longer periods of time, than traditional media makes practical today.
Beyond fraud and theft, this kind of solution could also help identify bullies who makedeath threats in High School.
Well, it's Tuesday, and that means IBM announcements! Today is bigger, as there are a lot of Dynamic Infrastructure announcements throughout the company with a common theme, cloud computing and smart business systems that support the new way of doing things. Today, IBM announced its new "IBM Smart Archive" strategy that integrates software, storage, servers and services into solutions that help meet the challenges of today and tomorrow. IBM has been spending the past few years working across its various divisions and acquisitions to ensure that our clients have complete end-to-end solutions.
IBM is introducing new "Smart Business Systems" that can be used on-premises for private-cloud configurations, as well as by cloud-computing companies to offer IT as a service. IBM [Information Archive] is the first to be unveiled, a disk-only or blended disk-and-tape Information Infrastructure solution that offers a "unified storage" approach with amazing flexibility for dealing with various archive requirements:
The Information Archive has all the server, storage and software integrated together into a single machine type/model number. It is based on IBM's General Parallel File System (GPFS) to provide incredible scalability, the same clustered file system used by many of the top 500 supercomputers. Initially, Information Archive will support up to 304TB raw capacity of disk and Petabytes of tape. You can read the [Spec Sheet] for other technical details.
For those who prefer a more "customized" approach, similar to IBM Scale-Out File Services (SoFS), IBM has [Smart Business Storage Cloud]. IBM Global Services can customize a solution that is best for you, using many of the same technologies. In fact, IBM Global Services announced a variety of new cloud-computing services to help enterprises determine the best approach.
In a related announcement, IBM announced [LotusLive iNotes], which you can think of as a "business-ready" version of Google's GoogleApps, Gmail and GoogleCalendar. IBM is focused on security and reliability but leaves out the advertising and data mining that people have been forced to tolerate from consumer-oriented Web 2.0-based solutions. IBM's clients that are already familiar with on-premises version of Lotus Notes will have no trouble using LotusLive iNotes.
There was actually a lot more announced today, which I will try to get to in later posts.
technorati tags: IBM, Dynamic Infrastructure, Smart Archive, Information Archive, Information Infrastructure, TSM, SSAM, WORM, NENR, DR550, GMAS, N series, SnapLock, compliance, disk, tape, storage, GPFS, LotusLive, iNotes, SoFS, Google, GoogleApps, Gmail, GoogleCalendar
With the economy recovering from the [Global Recession], my manager has been given authorization to hire new Subject Matter Experts (SMEs) for the [IBM Tucson Executive Briefing Center!] Here are a few answers to questions you might have:
Where is the job located?
The job is located in Tucson, Arizona, which is a great place to live! Tucson is the headquarters for IBM storage design and development, with the largest collection of engineers, software developers and testers. The IBM Tucson Executive Briefing Center is located on the [University of Arizona Science and Technology Park] campus that houses over 7,000 employees from 50 different companies.
What does the job entail?
Primarily, you will be developing, customizing and presenting Powerpoint presentations and live product demos. For some briefings, you will work with sales reps, IBM Business Partners, and clients to develop an agenda of topics to discuss. At times, the presentation may involve working to solve the client's problems, drawing on the whiteboard or flip charts to help capture the requirements and architect a solution.
Which products are we talking about?
The [IBM System Storage product line] includes solid-state drives (SSD), block and file-based disk systems, tape drives and libraries, storage virtualization, and storage management software.
Is there any opportunity for travel?
Most of the presentations will be performed in Tucson, either in person, by webcast or video conference call. Sometimes, this includes discussions over drinks, dinner or golfing. Occasionally, there will be travel to present at client locations, IBM branch offices, events or conferences. My manager estimates approximately 10 percent travel.
Is the pay based on a commission?
Absolutely not! We are consultants, not salespeople. To maintain our "trusted advisor" status, it is a flat salary, with possibility for year-end bonus based on how well our division does overall. This allows us to present and position all of the products fairly to the clients at briefings without bias. Our clients appreciate that! The job is considered pre-sales technical support.
Is training included?
Yes. Assuming you already have a strong background in storage hardware and software, and how these connect to SAN and LAN networks for a variety of operating systems like z/OS, AIX, Windows and Linux, there will be training for the latest updates and features of the IBM products throughout the year. Also, there will be professional training to build up your public speaking and meeting facilitation skills.
How do I apply?
If you are an American citizen, fluent in the English language, and have at least a Bachelor's Degree, go to the [IBM Employment website], look for "Storage Support Specialist" position using job code "STG-0524037" or "STG-0525309". IBM is committed to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status.
The job is immediately available. Apply today!
My how time flies! It has been nearly a year since our new Tucson Executive Briefing Center had its [Ribbon Cutting Ceremony].
To celebrate this achievement, IBM asked me to write and direct a short film to remind everyone we are here to help clients solve problems, determine an appropriate strategy and make solid purchase decisions.
I have produced other videos for IBM. See my October 2013 blog post [Incorporating Videos] for other examples. This was my first time as writer/director for a project.
This video won't win any Oscars, but I would still like to thank the Academy, my colleagues IBM VP Calline Sanchez, Lee Olguin, Joe Hayward and Kris Keller agreeing to be filmed on camera. Behind the scenes, I want to thank IBM Fellow John Cohn for his superb narration, Andrew Greenfield as cinematographer and editor, Shelly Jost as creative consultant selecting the musical tracks, and Denise White for reviewing the screenplay. Finally, I want to thank our producer, Bill Terry, for funding this effort.
What do you think? Will it go viral? Enter your comments below!
Continuing this week's theme on products that were part of last week'sIBM Information Infrastructure launch, today I'll cover the TS2900.
This little baby is SWEET! At 1U high, it holds a single drive and up to 9 cartridges,up to a total of 14.4 TB at 2:1 compression. Thedrive can be a Half-Height (HH) LTO-3 or LTO-4 drive. (It is called an autoloader because there isonly a single drive. Automation with multiple drives are called libraries).
This can be rack-mounted, or sit on your desktop. There is an I/O station for insertingor removing individual cartridges, as well as a removable tape magazine to populate orremove the tapes in a more efficient manner.
Both LTO3 and LTO4 support a mix of regular and "Write Once, Read Many" (WORM) media tohelp comply with regulations demanding "Non-erasable, Non-rewriteable" storage. TheLTO4 can also support on-drive encryption, managed by the IBM Encryption Key Manager (EKM).
To learn more, see the IBM System Storage[TS2900 page].Read More]