I've been learning a ton about all of this, which is to be expected after over a year (!) in this job. I have some terrific colleagues in text analytics, in research, development and services, who continually amaze me with their breadth and depth of knowledge, as well as their passion for the topic and their eagerness to help customers.
I happen to love linguistics, so this job is a great fit for me. I love to read, and the turn of a phrase in a book or a song lyric brings me joy. I like to think about the best way to phrase things and ways to interpret sentences. The more I interact with non-native English speakers, the more I appreciate both the beauty and the limitations of language, and the inherent difficulties in both generating and understanding sentences. I truly enjoyed all my study of French in school, too. It's always been such an intellectual snobbery to say something like "the French have a word for that", but anyone who knows more than one language knows it to be true -- language translation is never exact and concepts cannot always be expressed well, even in one's native language. Bottom line is that it's so interesting for me to dig into and help shape the technology and rules around extracting meaning from unstructured text.
Last month I was talking with a long-time friend and colleague who was here with her company at the IBM Silicon Valley Lab for a technology briefing. She and I have had several conversations at conferences over the years on topics like O-O databases, Java, XML, as they were emerging towards mainstream over the years. In the briefing, we talked a bit about unstructured data in the context of the Information Agenda, and one of their company's thought leaders said that unstructured data inclusion is implied. Cool, but, um, how exactly? Their (very reasonable) response when I probed a little further was that they needed to hook up with the business guys on that. YES! That's where I think we all absolutely have to start -- what is the unstructured data, and what questions do you need answered from it for business value. Then we go into more of the logistics around that.
Specifically, just this week I've worked on a couple of items that can help me bring some meaning to what text analytics is all about to folks who haven't been exposed to it deeply. The first was working on a report for a well-known analyst group, where we describe our information access technologies and offerings for unstructured data. And the second is a new offering that I hope will become available soon, to help quantify needs and specific business value that can be derived from unstructured data. If you are curious about any specifics, a great place to start digging into and even playing with some text analytics technology is the LanguageWare capabilities, system text and UIMA.
If you're interested in this kind of stuff, please let me know, or contact your local IBM rep and ask them (and tell them to ask me if they want a starting contact!) I'm passionate and eager to help! :-)