An Analysis of IBM DB2 Enterprise Content Management Products

This article lays out the complementing data models and functionality of DB2 Content Manager and DB2 Content Manager OnDemand and shows how some use case scenarios can be solved by using both products together.

Share:

Naga (Arun) Ayachitula (nagaaka@us.ibm.com), Senior Software Engineer, IBM T.J. Watson Research Center, Hawthorne, NY

Naga Ayachitula (Arun) is a senior software engineer at the IBM T.J. Watson Research Center, Hawthorne, NY. Naga has a Master's degree in Computer science. Naga is involved in the development of enterprise content management solutions for the media industry. Naga can be reached at nagaaka at us.ibm.com.



Michael Schwartz, Senior Software Engineer , IBM T.J. Watson Research Center, Hawthorne, NY

Michael S. Schwartz is a senior software engineer at the IBM T.J. Watson Research Center, Hawthorne, NY. Michael has a Ph.D. in Mathematics. His projects have included Database systems, Decision Support Systems, Multimedia Systems and Financial Systems. He can be contacted at miss at us.ibm.com.



14 August 2003

© 2003 International Business Machines Corporation. All rights reserved.

Introduction

In this article we describe the benefits and issues in the latest versions of IBM® DB2® Content Manager (CM) and IBM DB2 Content Manager OnDemand (CM OnDemand). We describe the data models of CM and CM OnDemand, then we'll show you examples for each of these data models to establish the benefits of each model. Because both products have their core strengths, not all content management issues are addressed by each product. We'll describe an enterprise content management scenario in which both CM and CM OnDemand can work together to meet the requirements of the scenario.


Overview of IBM DB2 Content Manager and IBM DB2 Content Manager OnDemand

DB2 Content Manager, the core of the IBM portfolio for enterprise content management, provides a single, open, and comprehensive platform for managing, sharing, reusing, and archiving of all types of digitized content. The multi-tier, distributed architecture offers:

  • Scalability to grow from a single department to a geographically dispersed enterprise
  • Openness to support multiple operating systems, databases, applications and resources
  • An XML-ready data model
  • Integrates with mission critical applications and middleware like Siebel, PeopleSoft, DB2 Records Manager, WebSphere® MQ Workflow and WebSphere Portal for Web content management.

IBM DB2 Content Manager OnDemand is part of the Content Manager portfolio of enterprise content management middleware. Automated capture, and powerful indexing with immediate availability and instant access to bills, statements, and invoices supports customer service and improves operations. Advanced functions include CD-ROM distribution and PDF indexing. Electronic statement presentment supports call-center productivity and Internet enabled customer self-service.


Understanding DB2 Content Manager

This section describes the DB2 Content Manager data model, and highlights its area of strengths and where it has restrictions.

DB2 CM data model

Figure 1 shows the data model used by DB2 CM.

Figure 1. Data model
fig1

The DB2 CM data model is an object-oriented relational data model. An Item Type consists of "items," which have associated attributes assigned to them. Item Type is the main component of the data model that contains all the child components, if any, and the associated data. An Item type has:

  • A root component - is the first or the only level in the hierarchical item type.
  • Zero or more child components - is an optional second or lower level in the hierarchical item type.
  • Classification - There are two types of system defined item type classifications:
    • Non-resource item type - represents entities not stored in a resource manager. Items classified as an item class are stored as metadata on the library server.
    • Resource item type - represents objects stored in resource manager. Items that describe and point to content on the resource manager server such as video, images, files, and other data.

Dynamic data objects (DDOs) represent components (root components, child components and resource components) in the data architecture. A persistent data identifier can uniquely identify these objects that have data items for its attribute values and content. Each data item has a data identifier, a name, a value and properties (like nullable, data type, and so on). Links, or references, are represented as data-items, each of which refers to another item (resource or non-resource) in another item type. Links relate two items and provides the means to access the linked item. The link relationship has a name, an identifier, such as "contains" or "has". Only the root component of an item may be linked to or linked from. "Outbound" links are ones that have this item as a source. "Inbound" links are the ones that have this item as a target.

Strengths of DB2 Content Manager

Key strengths of DB2 Content Manager include:

  • Query language with integrated text search - Query of all aspects of CM data model. The query language is easy to use due to the total transparency of the system tables complexities. Cached data model definitions are used for efficient execution of queries. Text search, based on DB2 Net Search Extender is integrated thereby supporting full text searching, combined full text and index based searching. Attributes of any length, such as the abstract of a document can be used and then later be searched for any word or a combination of words within that text is supported by full text search on text type metadata attributes or text content. If the item type is defined as a full text index, text documents loaded into this type will automatically be full text indexed. Also, if a document is stored via the ODMA interface, then DB2 CM will automatically build a full text index. The query language also conforms to XQuery path expressions (XQPE) specification.
  • Federated search - DB2 Information Integrator for Content provides federated search and update for structured and unstructured information across disparate data sources. Different target data sources of any types can be configured easily in any combination. New data sources can be added and searched. The results obtained from a federated search are in a consistent data format (technically called dynamic data objects) regardless of the source.
  • Support for video assets - Archival and streaming video retrieval is supported by the video stream resource APIs. Since the content of a video stream object is usually large, persistent operations such as add, retrieve, and update are usually done via IBM VideoCharger Server or a third-party video server using a standard protocol like file transfer protocol (FTP). Based on the related metadata, video assets can be searched and a session can be established to stream the content from the video server to the video player directly. Multi-Segment play lists are also supported.
  • Workflow - Document routing provides integrated capability to route work along a predefined process. A process defines the way users perform the work and the route through which work progresses. Different routing alternatives include:
    • Sequential - A sequential flow of steps
    • Branching - Conditional routing based on user action
    • Ad hoc routing - Work is not performed in a predefined manner.
    The workflow can be monitored for productivity, analyze workload over time or view the entire history of a specific item.
  • Integration with legacy systems and vertical industry applications - DB2 CM provides an open, published, consistent object oriented set of APIs for easy application integration. This makes it possible to connect and enable application types like Customer Relationship Management, Enterprise Resource Planning, Web applications and legacy system applications.

Restrictions with DB2 Content Manager

DB2 Content Manager on its own does not currently provide support for the following:

  • Streamed data from the mainframe.
  • Batch loading of content. CM does not provide any utility for loading content into the system. You can manually ingest documents or files from the CM Client or you can write a utility to load documents in batch.
  • Microsoft® SQL Server. CM does support IBM DB2® Universal DatabaseTM and Oracle for its content repository. However, SQL Server is not supported in this context.

Understanding DB2 CM OnDemand

This section describes the DB2 Content Manager data model, and highlights its area of strengths and where it has restrictions.

DB2 CM OnDemand data model

Figure 2 shows the data model used by DB2 CM OnDemand.

Figure 2. Data model used by DB2 CM OnDemand
Figure 2

The DB2 CM OnDemand Server environment includes a library server and one or more object servers residing in one or more nodes. A library server maintains a central database about the reports stored in DB2 CM OnDemand. An object server maintains documents on cache storage, and optionally works with an archive storage manager to maintain documents on archive media as optical and tape. An object server loads data, retrieves documents and expires data.

The terms application, application group, and folder represents how CM OnDemand stores, manages, retrieves, views and indexes data.

  • A folder is the only object that users query and retrieve data (reports) stored in CM OnDemand. A folder can query more than one application group, provided the application groups have the same database fields.
  • The application group is where one defines the database, storage requirements for reports. An application group can contain more than one application provided the applications have the same database and storage management attributes. Each application represents a report that one want to define to the system.
  • An application describes the physical characteristics of a report. You must assign an application to an application group.

Strengths of DB2 Content Manager OnDemand

The core strengths of DB2 Content Manager OnDemand are:

  • Ability to transform streamed output using Xenos - Support for a broad set of print data streams is provided through tighter integration with Xenos transforms. Transforms, separately priced and available as an IBM offering, include the following:
    • Metacode to AFP
    • Metacode to PDF
    • Metacode to Metacode (for index/capture processing, while keeping native Metacode format)
    • PCL to PDF
    • AFP to PDF
    These transforms are tightly integrated, which makes it easy for the DB2 Content Manager OnDemand administrator to define and capture these output formats through the OnDemand utilities and other currently supported formats (AFP, Line data, PDF, and so forth). Being able to capture Metacode and PCL data streams means that customers with Xerox printers or business applications, which generate PCL output, to reap the many benefits of the Content Manager OnDemand Enterprise Report Management system. In addition, the transforms also provide for conversion of Xerox print data streams so that they can be captured, indexed, viewed and more easily accessed via the Internet. The solution retrieves the data stored its native format and converts it dynamically into e-content formats, such as PDF, XML and HTML for distribution.
  • Automating data loading - CM OnDemand automatically loads data using the ARSLOAD program, to create index data and load data into the database and the storage volumes. The ARSLOAD program is the main CM OnDemand data loading and indexing program. You can configure ARSLOAD program to monitor specific file systems for report data downloaded from other systems. If the data needs to be indexed, then the ARSLOAD program calls the indexing program that is specified in the OnDemand application. The ARSLOAD program then works with the database manager to load the index data into the database and works with the storage manager to load the report data and resources on to the storage volumes.
  • Service offerings that extend OnDemand functionality -
    • CD-ROM - Client Data Distribution (ad-hoc CD-ROM) services offering extends the capabilities of the OnDemand client by letting users extract data from an OnDemand server and write it to media that can be easily distributed. Other users can then access the OnDemand data from the CD-ROM in the same way that they access data that is stored in an OnDemand server. The ad-hoc CD-ROM services offering are designed for low-volume, ad-hoc building of a CD-ROM by an end-user with the OnDemand client.
    • CD-ROM - Production Data Distribution services offering supports high-volume, batch processing of input files and documents and the production of multiple copies of CD-ROMs. The Production Data Distribution services offering is designed for bulk data processing and the distribution of many reports on a regular schedule. The Production Data Distribution services' offering is a highly scalable solution and allows you to include user-defined files along with the OnDemand index data and documents.
    • AFP2WEB Technologies offerings transform and manipulate AFP data into various different formats for both loading data into OnDemand and displaying AFP data on the Web. They are tightly integrated with OnDemand and the OnDemand Web Enablement Kit.
  • Kofax Ascent Capture Integration - is an optional feature, which uniquely extends the standard Kofax Ascent capture capabilities and the OnDemand archive capabilities. It supports a high-volume production scanning operation that scans, extracts index data, saves the documents in a format that can be stored in OnDemand, and then automatically loads the documents and index data into OnDemand.
  • SQL Server support - CM OnDemand supports SQL Server as its content repository in addition to DB2 UDB and Oracle.

Restrictions with DB2 CM OnDemand

DB2 Content Manager OnDemand does not provide support for the following:

  • A flexible index schema. If the application has more searchable fields than the fixed indexed scheme allows then users might receive more hits than desired. After defining an application with an index scheme the user is confined to that index scheme. You cannot later on add or modify the fields in the index schemes.
  • Text searching. CM OnDemand does not have text searching capabilities, because it does include any text search engine. The only text searching that is available is defaulted to the standard database SQL-like queries.
  • Video/Audio. CM OnDemand does not incorporate video/audio archive, retrieve, and play capabilities.

A use-case scenario

Let's consider a scenario where both CM and CM OnDemand together can meet the needed requirements.

XYZ Bank has several facilities that provide many customers with a broad range of services, including bank accounts, insurance, loans, brokerage, mortgages, venture capital and so on. The bank mails periodic account statements to its customers. All the information about a customer and his loan or mortgage etc is stored on some media that is accessible. Video archives are also stored for items pertaining to insurance and venture capital requirements. The images of bank and personal checks in day-to-day transactions are also stored in accessible media.

The customer information stored is searched by the management personnel for review to approve loans or to monitor the status of the customer activity. These processes are generally done using workflows defined by the bank business procedures. The video archives are generally stored at one central repository and are accessible to other locations via streaming.

The bank intends to improve its service to customers by providing service on demand. The goal is to enable customers to log on to their account and perform daily transactions and generate statement reports thereby reducing the bank service personnel staff and also the number of locations. The bank might already deploy different solutions for each of these diverse requirements. However, because the current systems are not interoperable, it is increasingly difficult to integrate the content as needed. As a result enterprises are struggling to keep up with information integration or looking to solutions that provide enterprise wide capabilities.

Figure 3 depicts the bank's use-case scenario. Although the bank has a centralized system, it still has different repositories, each with different organizational schemes and technical procedures for maintenance. Each of the bank's locations connects into the centralized system using different solutions that target a particular aspect of a bank's transaction.

Figure 3. XYZ Bank current case situation with diverse repositories
Figure 3

Implementing a solution

Figure 4 shows how Content Manager products can be used to implement the above use case scenario.

DB2 Content Manager can archive customer account statements, loans, mortgage, insurance documents, images of checks, and so on. Related video clips are stored into the DB2 Content Manager VideoCharger and managed through the DB2 Content Manager.

Let's say the bank has a new local policy for customers living in Westchester County, New York. Those customers that have a good credit score are eligible for an increase in their credit line and should be notified via a customer invitation statement. Bank personnel can query the content repository with queries like: " Retrieve all customers who live in NY state, Westchester county who have a credit score more than 700 in their report". Sure, you could say that such information can be stored in some database table and queried as necessary. However, the important point to note is that bank personnel are not confined to one set of attributes for querying information. Documents that are archived for a customer may also be searched for information that satisfies changing business policies over time.

DB2 CM OnDemand archives the periodic customer account statements generated from the bank's existing legacy system streamed output. DB2 CM OnDemand can also scan different customer documents in a variety of formats for archive, retrieval and search after the document is indexed using a defined set of attributes that are looked for in the document.

DB2 Content Manager cannot store the periodic account statements generated from the streamed output of the bank's legacy system. So the bank personnel must ingest the periodic statement into the Content Manager repository. In the above credit score query example, DB2 Content Manager can perform a full text search on the documents stored for a customer and present the query results. However, it cannot notify the customers by dispatching the corresponding predefined invitation.

Figure 4. Current use-case situation implemented with DB2 Content Manager products
Figure 4

We clearly see that one part of the use case scenario can be solved using DB2 Content Manager and the other part using DB2 CM OnDemand. There is also another product called DB2 Information Integrator for Content that can do a federated search on both of DB2 Content Manager and DB2 CM OnDemand repositories in a single query.


Conclusion

IBM DB2 Content Manager and Content Manager OnDemand provide state of the art solutions for handling high-transaction and voluminous content in an enterprise-wide content management solution. However, each one of them addresses different aspects of the enterprise content management problem domain. With the increasing demand for archival and retrieval of information arising from different enterprise use-case scenarios, deploying CM and CM OnDemand solutions for managing content can resolve many enterprise use-case scenarios.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=14452
ArticleTitle=An Analysis of IBM DB2 Enterprise Content Management Products
publish-date=08142003