developerWorks: You're listening to developerWorks interviews, where we feature conversations with technical luminaries and thought leaders from a variety of disciplines on topics of interest to technology professionals. I'm your host, Scott Laningham, and our guest today is Bob Zurek, director, Advanced Technologies and Product Strategy for Information Integration Solutions, IBM® Software Group. He joins us to talk about the Information Server, what it is, and why it's so important today. Bob, it's good to have this opportunity with you. Thanks for doing this.
Zurek: Hey, thanks, Scott. I'm really excited to be on the podcast here and really happy to provide some insight about the great solutions that we offer within the Information Integration Solutions division.
developerWorks: And this is, I did get your title correct, right? There were quite a few words in that. There's a lot of information.
Zurek: [LAUGHTER] You sure did. Thank you.
developerWorks: Why don't we kick off with this: Anytime I hear the terms "information" and "server" in the same sentence or phrase, I immediately think about infoglut. And the presentation you pointed to me before this certainly used that term -- trying to make sense of and use all the data that's overwhelming individuals and companies today. Is that what Information Server is all about?
Zurek: Yes. I think Information Server is really a new solution that is really meant to address that infoglut that you just spoke about. I mean, it is amazing how much data is being generated on a daily basis. Inside the organizations, on the Internet. You know, VI, e-mail and you know, obviously with the holiday season fast approaching, actually we're right in the midst of that, the number of transactions that are being performed, all the customer interactions that are happening over the Internet. You do end up with a whole glut of information. And the Information Server really helps organizations truly understand that data, that information in a number of different ways.
developerWorks: Yes, I can't imagine. You mentioned holidays. For some businesses, especially retail businesses, I can't imagine the amount of data that flows in this time of year. It must be overwhelming, to say the least.
Zurek: Yes. In fact, it's the opposite, too, as you can imagine going up to your mailbox these days. I'm sure it's filled with lots of catalogs, and frequently, those catalogs come in the form of five or six catalogs. Recently, I walked up to my mailbox got three catalogs from a very prominent vendor. And the names on it were R. Zurek, RM Zurek, and Robert Zurek. And you know, I eventually did call the company and said, "You know -- I only need one of those catalogs." And they were apologetic and straightened out the information. And one of the ways that can be valuable to organizations like the retailers is part of the Information Server's capability is to help clean up poor information. So de-duplication of names -- customer names -- so that you don't end up spending a lot of money sending out duplicate catalogs or duplicate mailings. And some of that can become very expensive, especially when it comes to things like catalogs.
So the Information Server really also helps organizations collect data and move that data to centralized sources like data warehouses. But it also has key capabilities for data cleansing and capabilities to ensure quality and consistency by standardizing, validating, matching, and merging information to create really a comprehensive and authoritative information for multiple uses. This would be a single view of the customer. So instead of having all these same names associated with the single customer, the company can use the information to cleanse that information and put it into what we would say a gold master customer information.
developerWorks: How big a problem is that today, that duplicate information thing? Do you have a sense of that?
Zurek: Yes. I mean -- it's a multibillion-dollar problem out there. And the good news is that there's great technology in the Information Server to address corporations' needs. And we see that being embraced by organizations. In fact, some organizations are creating roles like the chief quality officer, chief data quality officer, or chief data officer, to really address technologically and business. And from a business perspective, solutions to help the organizations get rid of that infoglut and really turn that data into very valuable and meaningful information.
developerWorks: Now, what are the building blocks of the Information Server -- what you're talking about here, kind of from a high-level standpoint?
Zurek: Sure. The Information Server, first of all, technologically supports a lot of different operating systems, the who's-who of operating systems from mainframe to the personal computer to midrange servers that run on things like Linux® and Windows® and AIX, and other operating systems. So at the basis, it's a very easily deployed solution across a variety of different platforms. But the key building blocks are really technology to connect to different data sources. The only way you're going to be able to get access to those data sources out there in the form of data silos is really having the complete and comprehensive level of connectivity. So we can connect -- the Information Server can really connect to enterprise application sources like SAP and PeopleSoft, and JD Edwards, and Siebel systems. And also relational database systems, so DB2® is a good example to talk about. But also other data sources like Oracle, and Microsoft's SQL Server, and Cybase and some of the other typical database systems, including flat files and file systems that sit on mainframes and whatnot.
So part of the information integration, you really can't have a Information Server unless you have great connectivity, both inbound connectivity and outbound connectivity. So that's key. Another key component is being able to reach into those data sources and make meaning of the sources. So what do the tables look like, what do the columns look like, what do the rows look like. So we call that data profiling -- really giving, getting you insight and understanding of the source data systems before you start extracting that data and moving it into places like a data warehouse.
From the profiling of the data, the Information Server has this data quality component that we've been speaking about. And then from there, a transformation component. So depending on where the data is going, it may need to be transformed. So numbers may need to be transformed to strings, data may need to be aggregated. And that's really the core function of the transformation engine that is a critical element of the Information Server.
From there, there's a very, very rich library of transformation to be able to do transformation capabilities to be able to do those types of transforms. And from there, it goes off to its data sources -- in this case, maybe a DB2 data warehouse -- where you then could run your business intelligence from tools like Cognos and Business Objects and Microstrategy. Those are some of the core components. And then there's a whole Service-Oriented Architecture around this and a business process management capability to orchestrate these types of integrations to move the data around. And we're really excited about how our customers are taking advantage of these components as part of the Information Server.
developerWorks: So then surfacing this data in the form of services you can actually use and take advantage of then.
Zurek: Yes. As you can imagine, if a call center rep -- you know, you call into a call center for maybe some support on your computer or your television or any consumer electronic -- you're expecting them to know quite a bit about you and the products that you have by providing them with enough information. Well, there may be a process of -- they have to collect information from various sources to get that on to the screen that they're looking at from a customer information standpoint. And being able to invoke that in real time through a service-oriented interface is a key capability of the Information Server. It makes it a lot easier. The integration services become reusable components for a lot of different processes with inside the organization.
developerWorks: So if we were looking at just terms, singular terms for some of these things you've been talking about, there's the discovery level and then there's kind of an analysis level. What terms would you give for each of these blocks or facets you've been talking about here?
Zurek: Well, you know -- we would talk about data profiling. We would then talk about data quality, data transformation capability, data connectivity. And then the runtime. I will also remind the audience that with very large volumes of data, and a lot of times people will try to tackle this using manual coding mechanisms, writing it in languages like PL SQL. And what's happened over time is that those approaches haven't been very scalable. They've been time-consuming and not -- you know, the developers that try to do this in a manual way end up struggling at times to deliver the data to the business users.
And so the other thing is as the data volumes continue to grow massively, moving this data cannot happen down a single pipe, right? So think about this for a moment. You're approaching a toll booth and here in Massachusetts, where our business unit is located, we have a highway called the Massachusetts Turnpike, and as you enter in Boston, one of the things you're going to hit is one of the six- or seven-lane toll booths. Now imagine each car as an element of data. Imagine if there was only one toll booth. Things get bogged down. You know, the traffic would be lined up probably back 30 miles, right? And so what the Information Server, the analogy here is the information turns that single toll booth into multiple toll booths. So we call that a parallel-based approach to data movement. And those things get very complicated if you have to do it manually. Well, the Information Server automates the whole thing, so if you're deployed on a multi-CPU unit, the data can move in parallel. That means going through multiple toll booths vs. a single toll booth. And I like to use that analogy because I think people can really relate it as the bits of data are flying through -- imagine getting stuck in that toll booth for a minute versus the option to go through six or seven lanes or 20 lanes, depending on the number of CPUs you have.
developerWorks: Right. That way, you don't have data going back home and sleeping in in the morning. [LAUGHTER] You can get to work.
Zurek: Yes -- or never showing up at the door, right? Or the workplace.
developerWorks: How many of these elements are in place with the typical midsize larger enterprise? Are we talking about leveraging some legacy stuff, but also bringing in a lot of new technology, too?
Zurek: Yes, that's right. I mean, in the past, a lot of companies have deployed what they would call ETL or Extract, Transform, and Load technology. And that's a key element of the Information Server. So those things exist today. Part of the Information Server is to provide the ETL process. But more and more organizations are really making decisions to deploy the Information Server as a whole because it incorporates all the characteristics and traits you would have in moving information through an organization.
And then we see over time, more and more information coming in the form of unstructured data and semi-structured data like XML. And they'll want to move it into database like DB2 9, which has phenomenal support for XML. So if the data source is in XML, the Information Server can handle that data from different sources and move that into technologies like DB2 9 with their native XML capability. So new innovations are always coming to bear here.
developerWorks: What kind of developer skills are most important for implementing and maintaining this kind of transformation?
Zurek: I think that's a great question. First of all, I will say that the demand for data-oriented developers is significant. It's significant. It's probably one of the highest. According to market research and stats, it's one of the most high-demand jobs out there in the sense of companies looking for data-oriented experts. I'll tell you, if you go up to Monster.com and you pump in ETL or data quality, you'll see hundreds, if not thousands, of positions open there. So I think there's huge opportunity for data-oriented developers to get up to speed on the Information Server and really provide good career and future career opportunities for them.
But typically, you'll want someone that has good background in relational database systems. They're comfortable with the knowledge of how to transform data, either through previous experience using manual coding techniques. I think it's always good to have a good experience in technologies like data warehousing, business intelligence, OLAP (Online Analytic Processing) solutions. All those are great skills to have. But frankly, if you're a DBA, those skills a DBA could be easily transformed into an integration specialist, provide significant value to the organizations using the technologies of the Information Server. So we're really excited about helping developers get skilled up on the Information Server and showcase the fact that these are great career opportunities for them.
developerWorks: And maybe I should ask you right here about mentioning some good Web resources with IBM and developerWorks for people to go and start developing these skills, or at least enhancing them.
Zurek: If you go to the main IBM.com site and in the search box in the upper right-hand side, if you type in "Information Server," you'll get to a broad amount of information about the Information Server.
On the other hand, there's a lot of great articles on developerWorks about data, data management, data integration. And I'm really excited about the IBM developerWorks blogs, which is IBM.com/developerworks/blogs, where you'll get a lot of insight from a variety of technologists at IBM -- very smart group of bloggers -- which is a great source of kind of perspective on what's going on. And then one other Web site that I think our listeners would enjoy going to is IBM.com/blogs/iod. IOD stands for Information On Demand. And on that Web site, you'll find a variety of bloggers. You'll probably find me frequently blogging on this topic, on this whole notion of information on demand, which really embraces the concept of the Information Server. So that's a few of the resources that I think our listeners might find valuable to get up to speed on this topic area.
developerWorks: Absolutely. And you're, of course, a blogger on developerWorks. Your blog is a place that people should certainly check out, visit regularly, and if for no other reason than today, I see a lovely Flexible Flyer photo.
Zurek: Yes -- I'm one of those wacky bloggers. One day I'll be talking about the top IT control weaknesses and the next day I might be talking about Flexible Flyers -- you know, the sled we used to use as kids. [LAUGHTER] So I'm prepping mine for a race down the big hill as soon as we get some white stuff.
developerWorks: So you're going to spray the bottom of it with margarine like Chevy Chase did in the movie?
Zurek: I may not go that far. But a good snowboard or ski wax does the job. [LAUGHTER]
developerWorks: And again, you can find Bob's blog on developerWorks at IBM.com/developerworks/blogs. And I will also mention all of this in my blog, which means then you'd have to search for that one to find the others. So I'm not sure that really solves anything.
You know, Bob, kind of as a closing thing, I should certainly ask you to talk a bit if you would about what businesses are doing with Information Server. Maybe you could point to some examples of impact with this.
Zurek: Sure. There is always an opportunity to see real stories about our customers adopting this on the Web site that talks about Information Server, again just going to IBM.com and going into the search box and putting in Information Server. And we actually have other podcasts and video testimonials by our customers. And some of the customers that stand out are people like Blockbuster Video, a very consumer-oriented business, responsible for being one of the largest providers of video, whether it's rental, purchase. And a lot of us that live in the cities and the surrounding areas find a Blockbuster pretty close by. And they're leveraging the capabilities of the Information Server to really get a grasp on what's going on in the business, what kinds of videos are being rented. You know, what are the prominent videos, the top 10, and so on and so forth, and really making sure that they continue to leverage that information to properly market to their customers.
So a lot of CRM initiatives that are leveraging the capabilities of the Information Server. They're able to really rapidly do analytics to see trends and adoption and allows them to adjust their inventory and hopefully squeeze more profits out of their business to ensure their success over time. Harley Davidson is another terrific example. They were recently speaking at our information on demand conference in October that we had in Anaheim, with great success. And Harley-Davidson is a very aggressive user of predictive analytics solution. And the only way they can achieve success along the analytics and get a really good view of what's going on from manufacturing to marketing to customer interactions is by having the proper data in place in their warehouses and they leverage the capability of the IBM Information Integration Solutions product line to do that. So we're really excited about having them as customers. And that's just the tip of the iceberg of the types of successes we're seeing in our customer base.
developerWorks: Do you expect a real spike in activity around Information Server in the coming months?
Zurek: Absolutely. I mean we look at it this way. You know, many years ago we saw the emergence of the application server. And the application server was responsible for growing a lot of companies, including IBM with its WebSphere Application Server business. And the application server was really responsible for serving up applications to end users or Web applications to consumers or via the Intranet. We see that it was incredible growth with great businesses spinning up and very large scale level of adoption. And now it's pretty much mainstream. You're going to start an application development project; it's likely that an application server will be involved. So we compare this to the way the application server grew. We believe now it's time for the Information Server. And you know, applications are only as good as the data that it presents. And the Information Server will take the lead role in prepping that information, getting it to the right place at the right time, any time, anywhere.
developerWorks: Well, that makes me happy, because even though I don't work in that arena, just thinking about all that data just freaks me out.
Zurek: [LAUGHTER] It freaks a lot of people out. But we're glad to help them overcome those situations.
developerWorks: Bob, this has been very informational, and really appreciate you making time for this. Maybe we should follow it up again in a few months to see how things are going.
Zurek: Would love to. And I really appreciate all the work you're doing with the podcasts, and it's really exciting to have you hosting these for us.
developerWorks: Well, it's a lot of fun. I know you did some radio yourself, as well, didn't you?
Zurek: Yes, but don't hold that against me. [LAUGHTER]
developerWorks: Well, that's why you're so comfortable doing this. I need a co-host, so we'll have you back.
Zurek: Thank you very much, Scott. All the best to you.
developerWorks: Our guest again today has been Bob Zurek, director, Advanced Technologies and Product Strategy for Information Integration Solutions, IBM Software Group. Find links to many of the things talked about in this podcast on our developerWorks podcast pages at IBM.com/developerworks/podcast. You'll also find a link on that page to my blog, which includes an entry related this podcast, with links, as well.
For Bob Zurek and everyone at developerWorks, I'm Scott Laningham. Talk to you next time.
IBM Information Server
Bob Zurek's developerWorks blog
Information Server scenarios
Scott Laningham, host of developerWorks podcasts, was previously editor of developerWorks newsletters. Prior to IBM, he was an award-winning reporter and director for news programming featured on Public Radio International, a freelance writer for the American Communications Foundation and CBS Radio, and a songwriter/musician.