A chat with Watson Project lead, Dr. David Ferrucci

From SXSWi 2011: How do you debug an intelligence that emerged from many algorithms?

Chief Investigator for IBM®'s Deep QA/Watson Project, Dr. David Ferrucci, was in Austin, Texas for SXSWi 2011. He stopped to talk with Todd Watson (no relation to the supercomputer) and Scott Laningham about IBM Watson's historic win on the Jeopardy! game show and about what the project means to the developer community.

Scott Laningham (scottla@us.ibm.com), Podcast Editor, IBM developerWorks

Scott LaninghamScott Laningham, host of developerWorks podcasts, was previously editor of developerWorks newsletters. Prior to IBM, he was an award-winning reporter and director for news programming featured on Public Radio International, a freelance writer for the American Communications Foundation and CBS Radio, and a songwriter/musician.



28 March 2010

developerWorks: Scott Laningham at South by Southwest 2011, the interactive portion of the conference. And Todd Watson, who you've been hearing me on the developerWorks podcast with, covering great things happening at the conference.

But we're thrilled to have an opportunity to do a video interview with Dr. David Ferrucci, who is sitting over here out of the shot. Dr. Ferrucci is the leader of the Watson Project, and you all must be familiar with that. Watson just defeating some of the best champions on Jeopardy, and he's stopping just for a few moments to chat with us. Thanks so much for being here.

Get ready for complex debugging tools

Watch this vidcast.

Ferrucci: You're very welcome. It's a pleasure.

developerWorks: Well, first of all, what does bring you to town? Tell us what's happening.

Ferrucci: Well, people like you and others who want to hear a little bit more about Watson and what happened on Jeopardy, but not only that, where is it going? Where is the Watson technology going? So I think I'm here to talk to a whole bunch of people about that.

developerWorks: I have to ask you this, because you know the audience will want to know, and did the contest after Jeopardy, against the congressmen, in any way diminish the excitement about all of this?

Ferrucci: Not at all. In fact, if anything, it really helped. You know, we were...of course, you know, when Congressman Holt magically just sort of won the first round of the Jeopardy contest, we were all worried about, gee, what will people think about this? But you know, we knew that the Jeopardy game has lots of ups and downs. It's got those Daily Doubles and you can double your score, and you can get a bad category every once in a while.

So there's a big sort of luck element in the Jeopardy game. And our focus was really on the technology that...a capability that demonstrated that you can compete with champions, that you're good enough that you can understand these questions well enough to get answers, to get confidences in those answers, to analyze all the confidence.

Once you can do that, then we all know that now you're in the league to play with champions. But if you put two champions together—the best and the best—they have about a 30 percent chance of beating one another because of the luck elements in Jeopardy. So, you know, that kind of thing, of course, happens.

But it turned out to be really great for us, actually, because Congressman Holt looked at this and said, wow, this is such an amazing...as a Jeopardy player himself and as a physicist from Princeton, he sort of had an appreciation for how difficult of a challenge this really is. And he's been talking it up all over the Hill, right, and just getting a lot of attention.

And one of the messages he has is we've got to take chances. We've got to invest in research. We've got to invest in R&D. And you know, IBM is doing this as a private company, but the public sector has to do it as well. So rather than taking money out of research, government's got to be putting money in research.

developerWorks: Fantastic. I want Todd to get in here.

Watson (not the computer one): So, can I say, "Dave"—is that okay, Dave? First of all, I want it to be known that my name was Watson before...my last name was Watson before this whole thing came up. Seriously, though, I mean, kudos to you and the entire team. The pure R&D effort that was put into this I think is just phenomenal.

As somebody who's been around IBM for a while, I saw the Kasparov thing live and in person, I just, I sat in front of my TV every night just, my jaw was dropping. And I did see the webcast you did internally, which we were privileged to see, to get a little more insight into the Q&A technology.

So I wanted to kind of probe on that for the developerWorks audience especially. Was there anything or any moments that surprised you this last year that you just kind of had an epiphany about the technology and the approach you guys were taking?

Ferrucci: So I think what surprised us always was, you know, how difficult it was to tell whether or not Watson would be able to answer a particular question. I think that's because no human could absorb all that content and say I know exactly what all that content says. We had a sense of what the algorithms were capable of and not capable of, so we knew that if the information was organized in a certain way, if it could read it and understand it, but you never know what that stuff really said.

So I remember one question where it was, what's the eighth wonder of the world, or something like that. And you know, Watson said King Kong, right? Because it had probably read the script of the King Kong movie and...[LAUGHTER]

...was aptly certain. So we look at that and we go, wow, you know, the system has to start to understand, look for cues about whether or not information is factual, fictional, opinion, and you have to start assessing information from that other dimension.

So every time it would get one of these questions wrong in sort of this weird way, they would say, wow, we've got a lot more work to do. And it's never about that one question, because you're not going to see that question again. It's about what...how do you have to advance the algorithms in general to start to deal with a problem that you're seeing?

So we constantly have to do error analysis, we have to iterate on what kinds of questions are we able to get, which ones are we missing, how do we advance the algorithms?

Watson: Yes, I felt that's one of the things that people kind of missed in the explanations afterwards, they didn't.... They would watch and say, oh, like the Toronto thing, like, well, of course that question got missed because of the missing context. But as you go forward, it was going to be able to figure that out. And I think that's the learning that people need to understand, is...

Ferrucci: Actually, there's something really interesting that went on with that question, is that you know, most people look at that and they think, oh, U.S. Cities, why didn't it answer with a U.S. city? But the reality is that just because the Jeopardy category says U.S. Cities, does not mean all the answers are U.S. Cities.

In fact, there's examples where it says U.S. Cities and the answer is the Erie Canal, or shamrock, or, there's just all kinds of things they ask about. It could be European cities that are like U.S. cities, right?

So Watson learned actually, which is interesting, that the category does not give a clear signal on what the answer type is actually supposed to be. And then the question didn't say, "it's this U.S. city's largest airport;" it just said, "its largest airport." So it gave some evidence, some weight to being a U.S. city but not a whole heck of a lot. Chicago was its second answer.

But bottom line—and this is a key thing about the technology—Watson was able to assess the evidence. It knew that of everything it tried to understand none of it really supported either of the answers. Its first answer was Toronto; the second answer was Chicago, the right answer. Both well below confidence.

So what was really interesting about what we did here, what the technology does and what we as a team sort of invented here, was how to gather and score evidence and say, you know, what? Fifty percent sure, 60 percent sure, not really understanding this. So we just don't throw, you know, a bunch of garbage up there; we're actually sitting there and saying that based on our analysis of all the content, here's what the confidence is.

developerWorks: I mean, I want to ask you, because I know we don't have a lot of time, but the developerWorks audience, what should worldwide developers, not just people that use IBM technologies, but all software developers, what's coming out of this project that they need to get really excited about?

Ferrucci: I think there's a couple of things. You know, one is sort of the hardcore development that has to go into a type of system like this, which I think creates tremendous challenges. The system is enormously complex in the following way. In order to innovate rapidly, we had to let lots of different folks, lots of different members of the team, develop algorithms that looked at the data from very different perspectives.

So what that means is we all couldn't get in the table and have one single over arching data model that supported every algorithm. Every algorithm couldn't output a confidence value that synched up perfectly with everyone else's confidence value. So what happened was we had to allow a fairly complex system to emerge where some algorithms would overlap with others; some could contradict with others. So you end up with a very complex system.

So what's challenging for developers is, when you see an intelligent, a capability, emerge from the interaction of many, many algorithms, how do you debug that? How do you understand what's going on?

We had to advance lots of tools. There are huge opportunities here to create advanced tooling that will allow developers to create these more almost organic systems—systems that many people can contribute without precise interconnections. So we like to think of it as loosely coupled, so loosely coupled algorithms.

But how do you manage a collection of loosely coupled algorithms? How do you debug them? And we invented some really interesting tooling there, and we'd love to share that with the development community to get them to understand, how do you deal with systems like this, because they can be very powerful, but how do you deal with them from a development perspective.

developerWorks: Is that the intent, to open it up and have a conversation around the tooling...

Ferrucci: Oh, I think definitely. I think once we start publishing papers, where we held back on a lot of this stuff, when we were so focused on performance.

But once we start publishing papers, you'll see how we dealt with this from a tooling perspective, how we allowed that rapid innovation cycle to happen with many different algorithms coming from different...looking at the data from different perspectives.

But the other thing I wanted to mention about developers is that, you know, what we're also interested in is looking at a Watson API and the opportunity to, because when you have this capability out there that can do this kind of thing, it would be really interesting to get an API that helps, allow people to leverage that for analyzing information and answering questions and gathering evidence...

But also for teaching it, an API for making it smarter because Watson has the capacity to learn, and I'm not talking about the actual question. Once you give it the answer to a question, it's not very interesting anymore. But when Watson tries to answer a question and fails, it learns what it doesn't know.

It learns what it has failed to learn in the past and can generate a question that you can then teach it by answering that question, making it smarter in the general case. So we're looking at two kinds of APIs: one from getting information out of Watson and one from putting information in Watson, that I think developers can develop to.

developerWorks: And there's so much that's involved in learning what you don't know, right? I mean, that can go on forever.

Ferrucci: Well, it's actually very, very cool stuff, because when you know you have a question and you have an answer, and then you know that the answer is in this passage somewhere, why did you fail to score that passage as supporting that answer?

So, from that you could generate questions—questions that if you had the answers to, you're learning more about natural language, or you're learning more about context, or you're learning about things that logically intel each other that you weren't able to learn otherwise. You know what you failed.

Think about how a child learns, very similarly, right? They say, you just told me what the answer is, but I was missing something. So and then they ask you, well, wait a second, does this abbreviation mean that? Or does that word mean...? And you go yes, and they go, oh, now it makes sense. And they learn something.

Watson: That's a great segue to what I guess will be our last question. I had a friend who has operated in this space, and he felt like that this was kind of the end of the AI winter—I think somebody may have said that publicly.

I'm curious specifically, though, what you're hearing from prospective customers who are interested in this technology, what kinds of practical business problems do they think this could help them solve? And how is IBM, how are we helping them do that?

Ferrucci: So we're real excited about, you know, obviously getting to be on Jeopardy. I mean, we didn't do this to generate a revenue stream by winning the...[ LAUGHTER ]...winning a game show. Did you know that? [ LAUGHTER ]

Watson: Well, if we did, we gave it all to charity, so, I mean...[ LAUGHTER ]...which is a good thing, but...

Ferrucci: That was not the plan. No, I mean, we're usually excited about that, because really what...you know, one of the things that we focused on as developers and as researchers was to create a general purpose capability. So, and what was exciting about Jeopardy in fact was that broad domain, because to whack that one question means nothing. You're not going to see that question again.

So what we created is a capability that can be used in lots of different areas, and really what it's doing is it's analyzing that input data, it's generating many possible answers.

Think about healthcare. When you do differential diagnosis, or when you come up with different treatment options, you're generating many possibilities that are supported by the input data, and then for each one, you're reading journals and reference books, and you're digging into what people are being...what's being said, what the latest information is, what the old information is. You're combining it all to say, which is my best one? Where's the evidence that supports this diagnosis over this one? Why Lyme disease over arthritis, whatever it might be.

And to collect and organize that evidence so that people can drill in, document and understand, you're solving a problem for people and you're giving them an advantage over the data. So rather than the data burying them, you're giving them deeper insight into that data.

In the health field, certainly. Finance, market intelligence, law. There's just a huge number of possibilities where this kind of technology can help.

developerWorks: This is fantastic. Can Todd and I follow you all around and just take this camera and go with you? Dr. David Ferrucci from the Watson Project. Thanks so much for spending some time with us. I'm Scott Laningham with developerWorks, and this is Todd Watson. The whole thing was based on his life, the whole Watson Project... [ LAUGHTER ] He wishes. Anyway, we're here...

Ferrucci: You can have Watson for a first name now.

Watson: Yeah, Watson Watson!

developerWorks: Again, we're at South by Southwest 2011. Thanks for watching.

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=
ArticleID=644874
ArticleTitle=A chat with Watson Project lead, Dr. David Ferrucci
publish-date=03282010