My IBM colleague Marissa Benekos brought her hand-held video camera to [Storage Networking World] conference in Orlando, Florida.I am not there, as I had a conflict with another conference going on here in Tucson, so am relyingon Marissa to feed me information to blog about.
In this segment, she interviews "booth babe" David Bricker. I've known David a long time,and if you are there at the conference, tell him I sent you to visit him at the IBM booth.
Sadly, I can't be in two places at once. SNW is a great conference to attend!Read More]
Comment (1) Visits (11423)
Continuing this week's theme of New Year's Resolutions for the data center, today we'll talk about one that many people make for their own personal lives: staying on a budget.
Often, when faced with a tightening budgets, we try to make more use of what we already have. Tell someone they are only using 10 percent of their brain, and they immediatelybelieve you; but tell them they are only using 30 percent of their storage, and they ask for a whitepaper,magazine article, or clarification on how that percentage is calculated. I actually visiteda customer that was only using6 percent of the storage attached to their Windows servers!
So, to help those of you making data center resolutions to stay on budget, the terms to remember are "Reduce", "Reuse" and "Rec If you are going to keep on a budget, remember that storage today is 30% more expensive than storage next year. That is the average drop in both disk and tape on a dollar-per-MB basis. If there is any way to postpone giving out storage until it is actually needed, you can save a bundle of money. Timing is everything! In the event of a disaster, getting immediate replacement for disk can be very expensive, but if you can wait just two weeks, you can negotiate a better deal. I thought of this while going to the movie theatre yesterday. A "hot dog" and a bottle of water was $8.00, but if you are able to wait two hours and eat after the movie, you can get a much better meal for less. Chances are, you have unused disk capacity spread across all your storage today, but perhaps they are formatted into small LUNs. The SVC can combine the capacity, and let you carve up big LUNs at the sizes you need.This is like taking all those tiny pieces of soap in your shower and forming a new bar of soap, or taking all the crumbs at the bottom of your bread box, and making a new slice of bread. And, the virtual LUNs are dynamically expandable,so give out only the amount they need today, as it is simple to expand them to larger sizes later. When evaluating your use of tape, determine if you are making best use of the tapes you have now, and perhaps a RECYCLE (or reclamation) scheme may be in order. Fewer tapes can save money in many ways, such as reduced storage costs, and reduced courier costs to send the tapes offsite. Tape media can still be 10-20 times less expensive than disk, based on full capacity.
If you are going to keep on a budget, remember that storage today is 30% more expensive than storage next year. That is the average drop in both disk and tape on a dollar-per-MB basis. If there is any way to postpone giving out storage until it is actually needed, you can save a bundle of money. Timing is everything! In the event of a disaster, getting immediate replacement for disk can be very expensive, but if you can wait just two weeks, you can negotiate a better deal. I thought of this while going to the movie theatre yesterday. A "hot dog" and a bottle of water was $8.00, but if you are able to wait two hours and eat after the movie, you can get a much better meal for less.
Chances are, you have unused disk capacity spread across all your storage today, but perhaps they are formatted into small LUNs. The SVC can combine the capacity, and let you carve up big LUNs at the sizes you need.This is like taking all those tiny pieces of soap in your shower and forming a new bar of soap, or taking all the crumbs at the bottom of your bread box, and making a new slice of bread. And, the virtual LUNs are dynamically expandable,so give out only the amount they need today, as it is simple to expand them to larger sizes later.
When evaluating your use of tape, determine if you are making best use of the tapes you have now, and perhaps a RECYCLE (or reclamation) scheme may be in order. Fewer tapes can save money in many ways, such as reduced storage costs, and reduced courier costs to send the tapes offsite. Tape media can still be 10-20 times less expensive than disk, based on full capacity.
Wrapping up this week's theme of New Year's Resolutions for the data center, the New York Times argues we should go easy on the resolutions, so I'll conclude with reducing stress. Lighten up! Relax, and try not to take your job so seriously.
(I know you're probably thinking, "That's easy for you to say, Mr. paid
technorati tags: New Years, resolutions, reducing stress, laughter, Tucson Laughter Club, Laughter Yoga, Sun, StorageTek, Kodak, Work/Life Balance, sleep, blogfights, assertive, music, LifeHacker, Live365, Pink Noise[Read More]
Continuing this week's theme of New Year's Resolutions for the data center, today we'll talk about one that people don't always think about on a personal level, that is to hone your tools and skills.
A long time ago, I used to be a regular speaker at the SHARE user group conference. One of the most attended sessions was Sam Golob presenting the latest CBT Tape set of tools. Over time, this large collection of "mainframe shareware" was handed out on 3480 tape cartridges, then on CDs, and finally made downloadable off the web.Sam's main point, which I remember to this day, was that everyone who has a job should figure out what tools they use, keep those tools functioning properly, and learn to use them well.
Later, I took some cooking classes at a culinary school. Among other things, we learned:
This last point hits close to home, as many people like me have too many tools that they do not use often enough to know how to use them well. Do I really need my strawberry corer, garlic press, or a tray designed for the storage and delivery of deviled eggs?
The same could be said about software tools. What tools do you use in your job? Do you feel you know how to take full advantage of their power and capabilities?If you develop software, do you know all the features for your debugging tools? If you develop advertising or marketing materials, do you know all the features of your photo or video editing software? If you manage storage in a data center, do you know all the tools for managing your storage area network (SAN), disk systems, tape libraries, and reporting tools to identify all of your files and databases across your entire IT environment?I would not be surprised if you could replace a whole mess of tools with just one, such as the IBM TotalStorage Productivity Center.Read More]
Happy New Year!
This year I resolve to be more consistent in my blogging, and my goal is to give you one to five entries per week, every week, based on the advice from Glenn Wolsey, Jennette Banks, and others.On some weeks, I will have a running theme, so rather than super-long entries to cover everything I can think of on a topic, make the entries short and readable. This week is a good time to review last year's "New Year's Resolutions" and to make new ones for 2007. I will discuss actions that companies can adopt for their data centers.
A common resolution is to lose weight, as in this Dilbert comic. Last year, I resolved to lose weight in 2006, and am delighted with myself that I lost eight pounds. When people ask for the secret of my success, I whisper in their ear "Eat less, exercise more." In general, people (and companies) know what to do, but just don't do it, which Pfeffer and Sutton document in their book The Knowing-Doing Gap. In my case, it involved lifestyle change: I exercised at a gym three times per week in Tucson, with a personal trainer, and revamped my diet.
Not everyone subscribes to the "eat less exercise more" philosophy. For example, Ric Watson argues in his blog that you can eat fewer calories, but eat more in actual volume, by choosing the right foods. This brings up the issues of "metrics" that most data centers are familiar with. Last year, I read the book "You: On a Diet" which explains that it is better to focus on "waist reduction" as measured in inches around your mid-section at the belly button, than "weight reduction" as measured in pounds. This year, I resolve to get down to 35 inches by the end of 2007.
The problem with measuring "weight" is that you are weighing bones, muscle and fat. A person can gain ten pounds of muscle, lose ten pounds of fat, and the scale would indicate no progress. The same problem occurs in data centers. How many TB of data do you have? Storage admins can easily tell you, but can they tell how much of this is bone (data needed for operating infrastructure), muscle (data used in daily operations that generates revenue) or fat (obsolete or orphaned data)?
We at IBM often state that "Information Lifecycle Management (ILM)" is more lifestyle change than a "fad diet". Figuring out what data you should capture in the first place, where to place it, when to move it, and when to get rid of it, is more important that just buying different tiers of storage hardware. So, for those looking to make new data center resolutions, I suggest the following actions:
Well, I'm back from my vacation from Bali and Singapore, and am glad to seethat my fellow blogger BarryB [aka Storage Anarchist] also had a chance to take a break to exotic locations.
Next Thursday, in the USA, is [Thanksgiving holiday], so this will give me a chance to catch up on my email and read everyone's blog posts and product announcements.
The following week, December 2-5, I'll be attending the 27th annual [Data Center Conference] at the MGM Grand hotel and casino in Las Vegas, Nevada. IBM is a Premier and Platinum sponsor for this event.Look for me in one of the many break-out sessions, one-on-oneexecutive meetings, or IBM's "booth 20" at the solution center. Our team will be showingoff IBM's XIV, SVC and TotalStorage Productivity Center offerings, aswell as explaining IBM Information Infrastructure and the rest of theNew Enterprise Data Center strategy.Read More]
Next week, December 3-6, Garnter is holding their annual [Data Center Conference]. This year, the guest keynote speakers include [Captain Chesley B. "Sully" Sullenberger] (the pilot who saved a planeful of people by landing the plane on the Hudson river in New York City), humorist [Dave Barry], and Tommy Minyard (Director of Advanced Computing Systems [Texas Advanced Computing Center]).
I had attended this conference the past four years, but sadly will not be attending the one this year. If you are attending this conference for the first time, perhaps a quick look at my blog posts from last year will help you get oriented:
For even more nostalgia, here are my recaps for the prior years:
If you are not attending, you can follow the action with [Twitter hashtag #garterdc].
Comment (1) Visits (9177)
For those in the US, last friday, the day after Thanksgiving, marks the official start of the Holiday shopping season. This has been called [Black Friday] as some stores open as early as 4am in the morning, when it is still dark outside, to offer special discount prices. Some shoppers camp out in sleeping bags and lawn chairs in front of stores overnight to be the first to get in.
Not surprisingly, some folks don't care for this approach to shopping, and prefer instead shopping online. Since 2005, the Monday after Thanksgiving (yesterday) has been called [Cyber Monday].USA Today newspaper reports [Cyber Monday really clicks with customers]. Many of the major online shopping websites indicated a 37 percent increase in sales yesterday over last year's Cyber Monday.
On Deadline dispels the hype on both counts:[Cyber Monday: Don't Believe the Hype?"], indicating that Black Friday is not the peak shopping for bricks-and-mortar shops, andthat Cyber Monday is not the busiest online shopping day of the year, either.
Despite the controversy, all of this increased use of the internet could lead to what is now being termed an "Internet Brown-out" in the next few years.Magaret Rouse of [IT Knowledge Exchange] points to this MacWorld article by Grant Gross titled [Study: Internet could run out of capacity in two years]. Here's an excerpt:
A flood of new video and other Web content could overwhelm the Internet by 2010 unless backbone providers invest up to US$137 billion in new capacity, more than double what service providers plan to invest, according to the study, by Nemertes Research Group, an independent analysis firm. In North America alone, backbone investments of $42 billion to $55 billion will be needed in the next three to five years to keep up with demand, Nemertes said.
If the "161 Exabytes" figure sounds familiar, it is probably from the IDC Whitepaper [The Expanding Digital Universe] that estimated the 161 Exabytes created, captured or replicated in 2006 will increase six-fold to 988 Exabytes by the year 2010. This is not just video captured for YouTube by internet users, but also corporate data captured by employees, and all of the many replicated copies. The IDC whitepaper was based on an earlier University of California Berkeley's often-cited 2003[How Much Info?] study, which not only looked at magnetic storage (disk and tape), but also optical, film, print, and transmissions over the air like TV and Radio.
A key difference was that while UC Berkeley focused on newly created information, the IDC study focused on digitized versions of this information, and included theadded impact of replication.It is not unusual for a large corporate databases to be replicated many times over. This is done for business continuity, disaster recovery, decision support systems, data mining, application testing, and IT administrator training. Companies often also make two or three copies of backups or archives on tape or optical media, to storethem in separate locations.
Likewise, it should be no surprise that internet companies maintain multiple copies of data to improve performance.How fast a search engine can deliver a list of matches can be a competitive advantage. Content providers may offer the same information translated into several languages.Many people replicate their personal and corporate email onto their local hard drives, to improve access performance, as well as to work offline.
The big question is whether we can assume that an increased amount of information created, captured and replicated will have a direct linear relation to the growth of what is transmitted over the internet. Three fourths of the U.S. internet users watched an average of 158 minutes of online video in May 2007, is this also expected to grow six-fold by 2010? That would be fifteen hours a month, at current video densities, or more likely it would be the same 158 minutes but of much higher quality video.
On the other hand, much of what is transmitted is never stored, or stored for only very short periods of time.Some of these transmissions are live broadcasts, you are either their to watch and listen to them when they happen, or you are not. Online video games are a good example. The internet can be used to allow multiple players to participate in real time, but much of this is never stored long-term. An interesting feature of the Xbox 360 is to allow you to replay "highlight" videos of the game just played, but I do not know if these can be stored away or transferred to longer term storage.
Of course, there will always be people who will save whatever they can get their hands on. Wired Magazine has anarticle [Downloading Is a Packrat's Dream], explaining that many [traditional packrats] are now also "digital packrats", and this might account for some of this growth. If you think you might be a digital packrat,Zen Habits offers a [3-step Cure].
In any case, the trends for both increased storage demand, and increased transmission bandwidth requirements, are definitely being felt. Hopefully, the infrastructure required will be there when needed.Read More]
This week, SHARE conference is being held at the Colorado Convention Center in Denver, Colorado. I covered this conference for 10 years earlier in my career. Now, my colleague Curtis Neal covers these on a regular basis, and is giving the following presentations this week:
Unlike other conferences where people just go once and are never seen again, SHARE brings back the same people back year after year, so that you can maintain relationships across organizations, and can carry on forward-looking strategic discussions.
Comment (1) Visits (16940)
On my last blog post [Is this what HDS tells our mainframe clients?], I poked fun at Hu Yoshida's blog post that contained a graphic with questionable results. Suddenly, the blog post disappeared altogether. Poof! Gone!
Just so that I am not accused of taking a graph out of context, here is Hu's original post, in its entirety:
At this point, you might be wondering: "If Hu Yoshida deleted his blog post, how did Tony get a copy of it? Did Tony save a copy of the HTML source before Hu deleted it?" No. I should have, in retrospect, in case lawyers got involved. It turns out that deleting a blog post does not clear the various copies in various RSS Feed Reader caches. I was able to dig out the previous version from the vast Google repository. (Many thanks to my friends at Google!!!).
The graph itself was hosted separately has been deleted, but it was just taken from slide 10 of the HDS presentation [How to Apply the Latest Advances in Hitachi Mainframe Storage], so it was easy to recreate.
(Lesson to all bloggers: If you write a blog post, and later decide to remove it for whatever legal, ethical, moral reasons, it is better to edit the post to remove offending content, and add a comment that the post was edited, and why. Shrinking a 700-word article down to 'Sorry Folks - I decided to remove this blog post because...' would do the trick. This new edited version will then slowly propagate across to all of the RSS Feed Reader caches, eliminating most traces to the original. Of course, the original may have been saved by any number of your readers, but at least if you have an edited version, it can serve as the official or canonical version.)
Perhaps there was a reason why HDS did not want to make public the FUD its sales team use in private meetings with IBM mainframe clients. Whatever it was, this appears to be another case where the cover-up is worse than the original crime!
I am in Toronto, Canada. It is a lot cold and rainy here, worse than last week in Seoul, Korea.This looks like a slow news week, so slow that the only news here in Canada is the possibility of anew 5-dollar coin. I thought I would make this week's theme about enterprise applications.
IBM doesn't make these applications anymore, we have decided to focus on our core strength, to be the best IT platform to run other people's applications. This means being the best IT systems, software and services company. However, many of the companies that make enterprise applications are both cooperate and compete against parts of IBM, what we call "coopetition".
Let's take a look at some acronyms in this space:
This week I will cover applications that address these, and how they relate to storage.Read More]
Comment (1) Visits (8104)
Wrapping up my week in China, I read an article by Li Xing in the local "China Daily" about energy efficiency in buildings. She argues that it is not enough for a building to be energy-efficient on its own, but you have to consider the impact of the other buildings around. Does it reflect the sun so harshly into neighboring windows that people are forced to put up blinds and use artificial light? Does it block the sun, so that rooms that previously could be used with natural sunlight must now be artificially lit?
A similar effect happens with power and cooling in the data center. Servers and storage systems generate heat, and that heat affects all the other equipment in the data center. IBM has the most power-efficient and heat-efficient servers and storage, but that is not enough. You have to consider the heat generated by all systems that might raise overall temperature.
This is what motivated IBM to deliver the IBM Rear Door Heat eXchanger, a member of IBM's CoolBlue(tm) portfolio.
According to a press release:
Research has indicated that water can remove far more heat per volume unit than air. For example, in order to disperse 1,000 watts, with 10 degree temperature difference, only 24 gallons of water per hour is needed, while the same space would require nearly 11,475 cubic feet of air. IBM's Rear Door Heat eXchanger helps keep growing datacenters at safe temperatures, without adding AC units. The unobtrusive solution brings more cooling capacity to areas where heat is the greatest -- around racks of servers with more powerful and multiple processors.
The eXchanger works on standard 42U racks, and can help clients deal with the rapid growth of rack-mounted servers and storage on their raised floor. How cool is that!
Continuing my week in Auckland, New Zealand, I presented my last three topics for the week.
We often joke that I.B.M. stands for "Information Between Meals"! Here we are at a restaurant in the [Britomart] area. I am on [the Paleo diet], which is low-carb, high-protein, dairy-free and gluten-free, and am trying to stick with it even when on the road traveling. Sometimes it can be challenging. Tonight, I opted for a light dinner, just roasted vegetables and grape-flavored beverage.
The folks in New Zealand love sheep. There are nine sheep for every person in this country. Here are some metal sculpture lawn ornaments.
Hyein and I needed new "desktop wallpaper" photos for our laptops. For those who want to dress up their laptops, here's one for each of us. (Click on each photo to see full size). Hyein kept getting her hair in the way. I didn't have that problem, but was worried my cap would fly off my head. This cap was a gift from my clients at [James Cook University in Brisbane, Australia].
In Top Gun classes, the top students are given "Top Gun" caps and their picture is published on the official website for all to see their success. Overall, the entire class did very well, and these three outstanding students had the top scores.
I am now in Sydney, Australia -- to teach Top Gun class again!
The BP oil spill in the Gulf of Mexico is a good reminder that all organizations should consider practice and execution of their contingency plans. In this most recent case, the [Deepwater Horizon] oil platform had an explosion on April 20, resulting in oil spewing out at an estimated 19,000 barrels per day. While some bloggers have argued that BP failed to plan, and therefore planned to fail, I found that hard to believe. How can a billion-dollar multinational company not have contingency plans?
The truth is, BP did have plans. Karen Dalton Beninato of New Orleans' City Voices discusses BP's Gulf of Mexico Regional Oil Spill Response Plan (OSRP) in her article [BP's Spill Plan: What they knew and when they knew it]. A [redacted 90-page version of the OSRP] is available on their website. The plan indicates that it may be 30 days from the time a deep offshore leak reaches the shoreline, giving OSRP participants plenty of time to take action.
(Having former politicians [blame environmentalists] for this crisis does not help much either. At least the deep shore rigs give you 30 days to react to a leak before the oil gets to the shoreline. Having oil rigs closer to shore will just shorten this time to react. Allowing onshore oil rigs does not mean oil companies would discontinue their deep offshore operations. There are thousands of oil rigs in the Gulf of Mexico. Extracting oil in the beautiful Alaska National Wildlife Reserve [ANWR] might be safer, it does not eliminate the threat entirely, and any leak there would be damaging to the local plant and animals in the same manner.)
So perhaps the current crisis was not the result of a lack of planning, but inadequate practice and execution. The same is true for IT Business Continuity / Disaster Recovery (BC/DR) plans. In all cases, there are four critical parts:
If you have not tested out your IT department's BC/DR plans. Perhaps its time to dust off your copy, review it, and schedule some time for practice.
ESG Analyst, Tony Asaro, talks about the many small storage startups having aBillion Dollar Impact on the storage system industry. Tony has counted over50 storage system vendors that are now in the marketplace. Is it really that many?Most of the time, the media only focus on the top seven major players, but I agree that big players like IBM should take trends about small startups like this seriously.
EMC Blogger Chuck Hollis suggests that this trend might be the start of a squeeze play, where top players and new upstarts squeeze out the middle playerslike Sun and HDS, in his postDesperate Times In Storage Land?
(His statement that IDC and Gartner have listed EMC as number one in "almost all"market segments is perhaps a bit misleading. IBM is number one in overall storage hardware, as wellas leading in tape drives, tape libraries, tape virtualization, and for that matter,disk virtualization. I don't know if IDC or Gartner count EMC Disk Library in the "tape virtualization" category, or if either analyst distinguishes between "cache-based" versus "switch-based" disk virtualization as separate categories.Perhaps Chuck should have qualified this to say "almost all of themarket segments that EMC does business in," which of course is better than the othervendors in the middle.)
This time around, Chuck pokes fun at HDS, IBM, Sun, NetApp and HP, much like "that guy" that skewersour favorite SouthPark characters Cartman, Kenny, Stan, Kyle in thisComedy Central MMORPG parody video. (And no, I am not suggesting Chuck looks anythinglike the cartoon character or his corresponding avatar)
Perhaps putting me in the same not-
Comment (1) Visits (8919)
The results are finally in. IBMer Wolfgang Singer was awarded "Top Speaker" award for his NAS and iSCSI tutorial at last year's Orlando 2006 conference. Here he is receiving the awardfrom SNIA Executive Director Leo Leger.
Of course, NAS and iSCSI technologies have been around for a while, but they are still new formany customers, which is why tutorials like this are so important.
Not everyone is clear on these technologies. For example, Dave Hitz asksis iSCSI SAN or is iSCSI NAS? I Don’t Know.
To avoid this confusion, IBM adopted clarifying technology.
Today, fellow IBMer Ken Hannigan celebrated his 25th year anniversary with IBM, which inducts him into the IBM Quarter Century Club[QCC]. I was surprised to hear that there are over 900 QCC members currently residing in Arizona. In the past, QCC was shortly followed by retirement,but in these economic times, it marks a mid-point in one's career.
I met Ken back in 1988, I was working on DFHSM and he was part of theDFDSS team that moved from San Jose, California to Tucson, Arizona.Later, Ken and I would work in the same department as architects forthe DFSMS product that included DFSMShsm and DFSMSdss components.
Ken was then offered a chance to lead the effort to launch a new productfrom an internal project called Workstation Data Save Facility (WDSF) thatwas changed to Data Facility Distributed Storage Manager (DFDSM),then renamed to ADSTAR Distributed Storage Manager (ADSM), and finally tothe name it has today: [IBM Tivoli Storage Manager].
Over the years, Ken's had some interesting experiences. Two examples:
Ken has been one of my best friends over the past twenty years. I introduced him to hiswife, and was the best man at his wedding. It is quality people like Ken that makeworking at IBM so special.Read More]
Tony Asaro has a nice piece about Confirmation Bias
There's nothing worse that feeling you made a bad decision.My favorite is buying something, and then finding it at a lower price somewhere else. Or worse,being in a country where you haggle over prices, and finding out that I might havebeen able to haggle further down than what I had paid.
Of course, the solution to making better, more informed decisions, is getting more information.That's what I love about being in the storage business.[Read More]
Comment (1) Visits (5439)
Last month, I had the pleasure to help train Watson in its latest mission, to help answer questions from sellers, this are not just for the IBM feet on the street, but also for IBM distributors and IBM Business Partners as well.
In their post [Workers Spend Too Much Time Searching for Information], Cottrill Research explains the problem all too well. Here is an excerpt:
"... [survey by SearchYourCloud] revealed 'workers took up to 8 searches to find the right document and information.' Here are a few other statistics that help tell the tale of information overload and wasted time spent searching for correct information -- either external or internal:
In the early days of the Internet, before search engines like Google or Bing, I competed in [Internet Scavenger Hunts]. A dozen or more contestants would be in a room, and would be given a list of 20 questions to find answers for. Each of us would then hunt down answers on the Internet. The person to find the most documented answers before time runs out wins. It was quite the challenge!
Over the years, I have honed my skills as a [Search Ninja]. With over 30 years of experience in IBM Storage, many sellers come to me for answers. Sometimes sellers are just too lazy to look for the answers themselves, too busy trying to meet client deadlines, or too green to know where to look.
A good portion of my 60-hour week is spent helping sellers find the answers they are looking for. Sometimes I dig into the [SSIC], product data sheets, or various IBM Redbooks.
Other times, I would confer with experts, engineers and architects in particular development teams. Often, I learn something new myself. In a few cases, I have turned some questions into ideas for blog posts!
It was no surprise when I was asked to help train Watson for the new "Systems SmartSeller" tool. This will be a tool that runs on smartphones or desktops to help answer questions that sellers might need to respond to RFP or other client queries.
The premise was simple. Treat Watson as a student at "Cognitive University" taking classes from dozens of IBM professors, in a series of semesters, or "phases".
Phase I involved building the "Corpus", the set of documents related to z Systems, POWER systems, Storage and SDI solutions; and a "Grading Tool" that would be used as the Graphical User Interface. I was not involved in phase I.
Phase II was where I came in. Hundreds of questions are categorized by product area. I worked on 500 questions for storage. For each question, Watson had up to eleven different responses, typically a paragraph from the Corpus. My job as a professor was to grade the responses to some 500 storage questions:
Most of the answers were either 1-star (not storage related) or 2-star (mentioned storage, but poor response). I would search through the existing Corpus looking for a better answer, and at best found only 3-star responses, which I would add to the list and grade as a 3-star response.
I then searched the Internet for better answers. Once I found a good match, I would type up a 4-star response, add it to the list, and point it to the appropriate resources on the Web.
Other professors, who were also looking at these questions, would then get to grade my suggested responses as well. Watson would learn based on the consensus of how appropriate and accurate each response was graded.
I don't know where the Cognitive University team got some of the questions, but they were quite representative of the ones I get every week. In some cases, the seller didn't understand the question he heard from the client, making it difficult for me to figure out what they were actually asking for.
It reminds me of that parlor game ["Telephone" or "Chinese Whispers"], in which one person whispers a message to the ear of the next person through a line of people until the last player announces the message to the entire group. I have actually played this at an IBM event in China!
Watson needs to parse the question into nouns and verbs, and use that Natural Linguistic Programming (NLP) to then search the Corpus for appropriate answer. I determined three challenges for Watson in this case:
I managed to grade the responses in the two weeks we were given. Part of my frustration was the grading tool itself was a bit buggy, and I spent some time trying to track down some of its flaws.
The next phase is in late January and February. This will give the Cognitive University team a chance to update the Corpus, improve the grading interface, and find more professors and different set of questions. I volunteered the most recent four years' worth of my blog posts to be added to the Corpus.
Maybe this tool will help me turn my 60-hour week back to the 40-hour week it should be!
technorati tags: IBM, Watson, Cottrill Research, SearchYourCloud, McKinsey, IDC, Google, Bing, Search Ninja, Internet Scavenger Hunts, SSIC, Telephone Game, Chinese Whispers, NLP, RFP, Storwize, RtC, Zoltar, Cognitive University