© 2003 International Business Machines Corporation. All rights reserved.
Dr. Mattos will be presenting on "Business Integration and the Role of Data Management" on June 4, 2003. After the Webcast, you will have the opportunity to chat with him. Register now for that Webcast (or the replay).
DB2DD: Nelson, you have led an interesting and fast-paced career in IBM, starting in research and now as a director for Information Integration. How does your work in research now affect the job you are doing?
Nelson: Since I joined IBM I have been working on leading edge data management technologies. I focused on object-relational extensions in DB2® Universal DatabaseTM in the middle of the 90s when that was a key strategic extension. That, in addition to a lot of work in SQL, led me to be IBM representative for series of standards efforts to ensure that IBM technology made its way into the standards. When you think about it, the work that I am doing today is not a lot different. As with object-relational at the time, which actually provides the semantic infrastructure for a lot of the information integration work we do now, information integration is next major step in the data management industry. Data is on the Web, in scanned images, in PDFs, in spreadsheets, and the list just goes on. Because this is a major shift, it's important to keep strong linkages with standards bodies, which is why until recently I was working quite closely with such standards bodies as W3C for XML and Web services.
DB2DD:Information integration is a key play for IBM in the information management space. IBM has been at the forefront in data integration technologies for years. Why is it now getting this focus?
Nelson: If you look at what companies did before Y2K, there was a tremendous focus on new application packages because in a lot of cases it was more cost-effective to deploy new packages rather than fix legacy code to handle Y2K issues. In addition, the world economy was in expansion, and there were large IT budgets. Companies were deploying these packages to support their lines of business without necessarily giving a lot of thought on how to integrate these packages.
After Y2K, the demand to deploy new packages went away from a Y2K perspective, and the economy changed. The economic downturn started the shift from deploying new solutions towards leveraging assets that had already been deployed. The way to leverage these assets is to integrate them to discover new information, to relate information from different repositories, to bring together information about customer and suppliers to make better business decisions, and so on.
And it's not just me saying this. A May 2002 survey by Morgan Stanley shows that integration is a top strategic priority for 35% of interviewed CIOs. IDC estimates that 40% of IT budget is already being spent on integration.
DB2DD: Can you give us some real life examples of particular business or technology problems that are being solved by information integration technologies?
Nelson: OK. Customers have implemented operational data stores or data warehouses, which today contain primarily structured data. To make business decisions, they need to extend the value of the warehouse by combining the information with unstructured, real-time data. For example, assume that a call center rep is handling a call from a customer. The application may need to access data in the ODS and combine it with e-mail from the customer and with scanned images that may represent hard copy letters from that customer.
Another good example is a customer using the Web to manage their finances. If a customer needs to see the value of their total portfolio, that can require the ability to integrate across bank accounts, stock investments, bonds funds and so forth. The sources of this data can be quite varied and some of that information must be gathered in real time, such as updated information from Wall Street feeds.
DB2DD: That's interesting. But it seems to me that in those scenarios there is more than just information integration.
Nelson: That's right. IBM recognizes that to solve business integration problems, you need three sets of technologies. First, you need technologies that provide the means to integrate the delivery of information to the user in a consistent way but which also allow for personalization. Second, you need to be able to integrate processes. For example, you need to be able to ensure that any changes to production are reflected all the way from ERP, through the CRM system and to the supply chain. Finally, you need technologies integrate information that is physically stored in many different repositories, such as e-mail, file systems, the Web, and databases.
DB2DD: Is there a way to classify these technologies simply?
Nelson: Yes, there is really an easy way to think of this:
- Portal technology focuses on the interaction with user.
- Process integration focuses on business events. For example, a sale is closed, and that event must be shared with the production database, supply chain system, and the CRM system.
- Information integration focuses not on events, but on the state of the business as currently reflected in a repository. Examples of these include the current value of a customer account, inventory levels, sales figures.
And, as you saw with the scenarios I mentioned earlier, we often see these three technologies used together. A portal may trigger an event which will fire a business process, which will update the state of the business in different repositories, which can be integrated for business analysis to allow for better business decisions. Let's look at the customer who is managing her portfolio via the Web. She sees her account information through a portal, decides to sell some stock, which triggers some coordinated business processes to ensure transactional consistency, which then causes updates to the various repositories that hold information relevant to this sale and this customer. Using information integration, the new value of the portfolio is reflected back to our customer, which enables her to make her next business decision.
DB2DD: Don Haderle spoke to us earlier about the different topologies that can be used for integrating information: data warehousing, where data is moved into a central repository, and federated technology, where the data stays where it is. Does IBM favor one approach over another?
Nelson: No, in fact this is one of the key differentiators between IBM and the competition. IBM recognizes that there are primarily these two approaches: Consolidating data access (moving data to a central repository) and federating the access (integrating not the data but the access to the data). Both approaches are necessary. What you use really depends on the characteristics of the problem you are trying to solve.
You centralize data when the performance of queries is critical such that it demands local access, or because data integration requires costly transformations that cannot be done in real-time. Use federation in the following circumstances: When you need to access real-time data, when data is in very different formats (such as structured and unstructured), or when the volume of data to be integrated is just too large to justify copying the data into a central repository. There may also be privacy or ownership issues of the data that would disallow you from copying the data.
What's important to note, though, is that these are not either-or propositions. We often see these technologies being used jointly. Look at the call center example. The centralized approach is used to build the data warehouse or operational data store with the customer data, and then you extend that store in real time to bring in the unstructured data such as e-mail and scanned documents.
DB2DD: What about search and analysis? If I have this vast 'virtual information store' of structured and unstructured information, how can end users find what they need? And will analytic tools be able to handle these rich content types and provide insights from such diverse data types?
Nelson: Search is a key component of an information integration infrastructure, otherwise, no person nor application will be able to find what they need to integrate. In fact, DB2 Information Integrator products will come with integrated text search to easily find information in many different repositories.
Regarding analysis, let me back up a step. We think it's important for an information integration infrastructure to provide different APIs because the development community doesn't use a single interface. We have the SQL crowd, we have the content management crowd with their OO APIs, and we have an emerging crowd of XML developers. Therefore, to open up this infrastructure to the widest number of developers, it must support the interfaces used by these developers.
So, to get back to the original question about analytical tools. Most of these tools currently use an SQL interface; therefore, these tools will be able to handle any information that can be integrated through the information integration platform. For example, we have been testing with Crystal Reports and other popular reporting tools without significant changes. And in most cases even significant performance improvements.
DB2DD: So you will give developers some API choices. What other benefits will developers see from developing new applications on this information integration infrastructure?
Nelson: One of the major benefits is the reduced time investment to develop new applications and to maintain them over time. Why is that? Today, if a developer needs to develop a new app from different repositories, the application must connect to the various repositories, write a request in the dialect of that repository, extract the data, and then in the application itself do the joins, correlation, transformations, etc. If later on that application needs to be extended to add another data source, the necessary logic must be added to the application to handle that new data source.
With information integration, this complexity is taken away from the developer. They will simply connect to DB2 Information Integrator, and it will know how to transform that request into the dialect of the backend systems and do the correct correlation, etc. If a new data source is added, the administrator just needs to make sure there is connectivity to the new source and make the minimal change to the underlying view that is accessing the remote data sources.
DB2DD: What about development tooling?
Nelson: DB2 Information Integrator provides the same interfaces as the DB2 family, so application developers can use the same application development environment, can use the same tools (Microsoft® Visual Studio, WebSphere® Studio , etc.) as they use today to develop applications on DB2 UDB, DB2 Content Manager and so on.
DB2DD: You've used the word "infrastructure" frequently in describing information integration. What roles will IBM Business Partners have in this environment?
Nelson: Clearly, there are many opportunities for Business Partners, both to leverage what we are offering as well as to complement the offering. As I mentioned before, BI tools can leverage information integration to remove complexity and to allow their business users access to more sources in real time with good performance. There are partners that can complement our offerings by providing connectivity to different sources, such as legacy sources, application packages, and industry-specific repositories. There are partners that can complement the offerings by providing content, such as credit information, financial feeds, and search engines, And there is a role for systems integrators to help users to develop wrappers or connectors backend data sources, provide deployment services, or to provide advice on which topology to use (federation versus centralization).
DB2DD: Finally, we can't leave you without getting your input on e-business on demandTM. Can you clarify the relationship between integration technologies and IBM's on demand initiative?
Nelson: For a business to be on demand, it must have an on demand operating environment that contains three key attributes: integration, automation, and virtualization. Integration creates an environment that provides flexibility to the business by allowing them to integrate unconnected business assets and processes. That means you need the ability to integrate, people, processes, and information. Information integration is a key underpinning of the integration attribute of an on demand operating environment.
DB2DD: Thank you very much, Nelson. We're looking forward to hearing you speak in more technical detail on these subjects in your upcoming Webcast.
Nelson:You're welcome. I'm very excited about all of this, and I welcome the chance to spread the word about information integration.
All statements regarding IBM's future direction or intent are subject to change without notice, and represent goals and objectives only.
- DB2 Information Integrator Web site: http://www.ibm.com/software/data/integration/iipreview.html
- IBM Systems Journal on Information Integration, at http://www.research.ibm.com/journal/sj41-4.html
- Haas, Laura and Eileen Lin. "IBM Federated Database Technology," at http://www.ibm.com/developerworks/db2/library/techarticle/0203haas/0203haas.html
- Lurie, Martin. "The Federation - Database Interoperability" at
http://www.ibm.com/developerworks/db2/library/techarticle/0304lurie/0304lurie.html
- Saracco, C. M. "Coping with Disparate Data in Web Applications," at http://www.ibm.com/developerworks/db2/library/techarticle/0208saracco/0208saracco.html
- Saracco, C. M. "Building Web Components that Access Federated Data," at http://www.ibm.com/developerworks/db2/library/techarticle/0209saracco/0209saracco.html
- Saracco, C. M. "Building Entity EJBs that Span Federated Data," at http://www.ibm.com/developerworks/db2/library/techarticle/0209saracco/0209saracco1.html
- Saracco, C. M., and T.J. Rieger. "Our Experience with Developing Entity EJBs over Disparate Data Sources," at http://www.ibm.com/developerworks/db2/library/techarticle/0305saracco/0305saracco.html

Nelson Mattos, Ph. D., is an IBM Distinguished Engineer and the Director of IBM Information Integration. In his current role, Dr. Mattos is responsible for establishing IBM's leadership position in the emerging information integration market. He collaborates with standards bodies and IBM customers, Business Partners and development teams to help businesses integrate digital information assets and leverage the value of those assets across the enterprise. Capitalizing on his strong research background, Dr. Mattos is responsible for the strategy, marketing, and development for such products as DB2® Information Integrator, DiscoveryLink®, replication, and Relational Connect.
Comments (Undergoing maintenance)





