DB2 CommonStore and its Backend Archive Options - Understanding the Technical and Functional Differences

The DB2 CommonStore product family supports three different backend archives: Content Manager, Content Manager OnDemand, and Tivoli Storage Manager. Each of these archive options has a unique architecture and particular strengths. This article provides a comparison and a detailed explanation of these differences and how they impact security, indexing, and workflow.

Abstract

The DB2® CommonStore (CS) product family supports three different backend archives: Content Manager (CM), Content Manager OnDemand (CMOD), and Tivoli® Storage Manager (TSM). Each of these archive options has a unique architecture and particular strengths.

Some functional features and options in CS depend on the backend archive option. This article provides a comparison and a detailed explanation of these differences and how they impact security, indexing, workflow and so on.

Furthermore, some of the technical aspects of the archive system itself such as storage, data compression, and document removal differ in conjunction with the CS solution. This article describes the relevant differences and implications for the CS solution.

The article is targeted at a technical audience that understands the technical concepts of CS and wants to gain a greater insight into the backend archive options.


Introduction

DB2 CommonStore (CS1 ) is a key component of IBM's enterprise content management portfolio. It links business-critical applications with the enterprise content repository. In this way, it provides archiving and retrieval functionality to these applications.

Technically speaking, CS can be regarded as an interface or gateway since it does not store any data or documents itself. It always relies on a backend repository for storage. Today, CS supports the following repositories:

  • DB2 Content Manager (CM)
  • DB2 Content Manager OnDemand (CMOD)
  • Tivoli Storage Manager (TSM)

Each of these repositories can run on many system platforms, ranging from MS Windows®, to UNIX®, to iSeries®, to zOS®.

Today, DB2 CommonStore comes in three flavors, supporting three business applications:

  • DB2 CommonStore for SAP (CSSAP)
  • DB2 CommonStore for Lotus® Domino® (CSLD)
  • DB2 CommonStore for Microsoft Exchange (CSX)

Since CS can itself run on different platforms, Table 1 clarifies which CS product can be run on which operating system together with which backend repository.

Table 1: Cross-reference for CS server platforms with supported backend repositories (current as of November 2003)

ArchiveCMCM (3)CM ExpressCM iSeriesCM iSeriesCMOD (3)CMOD iSeriesCMOD iSeries (4)CMOD 390CMOD zOSTSMImage Plus
Archive version7.1 FP18.1, 8.28.24.35.17.1 FP154.35.22.17.14.2, 5.1, 5.2
Archive server O/SWin, AIX, 390Win, Linux, AIX, Sun, zOSWin400 4.3, 4.4, 4.5400 5.1Win, AIX, HP-UX, Sun400 4.5400 5.1390390Any390
CSCS versionCS server O/S (2)
CSSAP8.2Windowsyy(5)nnynynyyn
CSSAP8.2AIXyy(5)nnynynyyn
CSSAP8.2HP-UXnnnnnynynyyn
CSSAP8.2Sun Solarisny(5)nnynynyyn
CSSAP8.2iSeriesnnnnynnnnnnn
CSLD8.2Windowsyy(5)nyynynyyn
CSLD8.2AIXyy(5)nnynynyyn
CSX8.2Windowsyy(5)nnynynyyn

Remarks:
(1) The CS Entry Offerings support the same backend repositories as the CS standard license model.
(2) CS can run on the following Windows versions: 2000, XP, 2003
(3) CS also supports the CM Entry Offering (that includes both CM and CMOD)
(4) CS supports the Common Server component of CMOD iSeries 5.2.
(5) CM Express has been announced after CS v8.2, this is why you don't see any reference in the CS announcement letters yet. Technically, CM Express comes with all connectors needed by CS. So no technical issues are expected. Nonetheless, development will run additional tests. As soon as they are successfully finished, the CS announcement letter will be updated accordingly. Bear in mind, though, that CM Express comes WITHOUT Tivoli Storage Manager. So no tape or optical storage can be used. As such, for example, the pair CSSAP + CM Express is not a good option for SAP data archiving.

The specific archiving/retrieval functionality depends of course on the business application. In the case of SAP, CSSAP uses the SAP ArchiveLink to connect to SAP. CSSAP has been thoroughly tested and certified by SAP to fully comply with this exit in SAP. The archiving capabilities of CSSAP can be split into two fields:

  • CSSAP data archiving where inactive data is offloaded from SAP
  • CSSAP document archiving where business documents in electronic form (as PDF, TIFF files, and so on) are archived and linked to a SAP business transaction.

CSLD and CSX are designed for e-mail archiving and records management. Both support the two most frequent mail archiving scenarios:

  • Personal mail archiving to offload inactive e-mails in order to reduce the growth of the messaging system
  • Journal mail archiving to capture and store e-mails prior to mail delivery and without possible user manipulation in order to meet legal or regulatory requirements

In addition, CSLD can be used in conjunction with any other Notes® application. Enabling a Notes application for archiving/retrieval with CSLD usually requires some template modifications that include adding some Lotus Script code.

Furthermore, CSLD also allows integrating Notes documents with existing CM applications and workflows by means of specific features like archiving to a CM workbasket, converting the Notes proprietary format to industry standard format that can be viewed outside of Notes.


Storage

Both CM and CMOD can manage archived items in a file system on hard disk. For long-term storage, both repositories, when run on Windows or UNIX, pass on the archived items to TSM. Whatever backend repository you choose, the archived items will finally end up in TSM. So do the three different backend repositories differ at all with respect to storage?

The answer is definitely positive. Yes, the three different backend repositories store each archived item in a very particular way. As an example, let's have a look at archiving Outlook e-mails with CSX. One hundred e-mails, each 80 KB in size, are to be archived and stored in CM, CMOD or TSM.

In CM, each e-mail is stored as a separate item. This means that there are 100 entries in the CM library and 100 items in the CM file system on the resource manager (object server). Each of these items is moved individually over to TSM for long-term storage. As a result, there are 100 entries in the TSM database and 100 separate items in TSM storage. Each entry in the TSM database is about 750 bytes on average.

CMOD, on the other hand, uses storage objects whose default size is 10 MB. CS puts all e-mails that go into the same application group together in such a storage object. In this example, all e-mails (amounting to a total of 8 MB) are stored in just one storage object in the CMOD cache. For long-term storage, this one storage object is periodically migrated to TSM. Consequently, there is only one entry in the TSM database and just one item in TSM storage.

TSM's primary purpose is the backup and restore of individual files. CS, however, requires storing some additional information beyond the file name. This is the reason that for each archived e-mail, CS creates two items in TSM: One item holds the e-mail, the second, very small item the additional, CS internal information.

Table 2 summarizes the above analyses of the different storage concepts of CM, CMOD and TSM.

Table 2: E-mail archiving example to illustrate the different storage concepts

Backend repositoryCMCMODTSM
# of e-mails in MS Exchange100100100
# of items in archive1001NA
# of TSM database entries1001200
Size of TSM database entries75 KB0,750 KB150 KB
# of TSM storage items1001200

Bear in mind that this example does not take into account any duplicate storage within TSM. This is usually done to increase data security. If one storage media is destroyed or gets corrupt, TSM can access automatically the copy of the archived e-mail in a different storage pool (on a different media).

The low number of items in TSM can be regarded as a particular strength of running CS with CMOD, especially in large e-mail archiving projects where several million e-mails are archived every year. Fewer entries in the TSM database make the daily operation more efficient since the TSM database becomes smaller and its backup (and also restore) becomes faster.


Meta data (attributes)

Both CM and CMOD allow storing meta data (attributes) together with each archived object since both repositories are based on a relational database. You cannot store any meta data in TSM due to the lack of such a database.

In the case of e-mail archiving, for instance, such meta data could be the subject or sender field of the message. For each attribute, a specific data format has to be selected that matches the data format in the business application. See Table 3 for an overview of how to map the different data formats in the business application to attributes in CM or CMOD. Note specifically that a timestamp is stored as a variable string in CMOD. In the case of a text variable, the maximum length in CMOD is 254 characters compared to just fewer than 32,000 characters in CM V8 (just under 4,000 characters in CM V7).

Table 3: Mapping attribute formats

Business applicationCMCMOD
TextVariable character, extended alphanumericVariable string, mixed case
NumberInteger, long integer or decimalInteger, small integer or decimal
Date onlyDateDate
Time onlyTimeTime
Date and timeTimestampVariable string

There is another interesting difference in how CMOD manages the attributes compared to CM. In CMOD, old attributes can be migrated to TSM and stored on tape. This optional feature might be of particular value when dealing with very long data retention requirements or very large volumes. CM and CMOD also differ in the maximum number of attributes, but even the lower one (32 with CMOD) is definitely high enough when used with CS.

In addition to the application-related attributes, some additional technical attributes have to be set when CommonStore is run together with CMOD. Let's have a closer look and explain why they are needed in the context of CSLD. Table 4 shows these additional attributes for CMOD.

Table 4: Technical attributes with CSLD and CMOD

AttributeSample dataPurposeCM equivalent
DOC_ID2003071515491916#0C60.5815Opposed to CM, CMOD does not generate a unique document identifier after archiving. This is why CS creates one and stores it in this technical attribute.In CM, each archived item has a unique identifier called ITEMID in the SBTITEMS table.
CONTENT_TYPEDOC (example for a Word attachment)CSN (example for native archiving)In CMOD, the content type is always linked to the CMOD application and NOT to the individual document. Since many documents with different content types can be stored in the same CMOD application, this technical attribute is required to provide the correct content type during retrieval.CM data format, CM content type, CM mime type
WORKBASKETTo be indexedServes to emulate a CM workbasket within CMOD. CSLD can archive items directly into a workbasket, list items of a workbasket, remove items from a workbasket or move them from one to another workbasket.CM workbasket and worklist.
ORIGFILENAME3F1476D30E00C605815native.CSNWhen archiving file attachments, this technical attribute is used to store the file name of the attachment. The maximum length of a file name is restricted by the length of the attribute (maximum 254).In CM, this attribute is hidden in the system attribute ITEMNAME in the SBTITEMS table and not visible in the search panel from the CM Client. Note that this field has a maximum length of 50 characters in CM v7. If a file attachment has a name larger than 31 characters, CommonStore cuts off any additional characters during store and retrieval.
FOLDER2003071516571982#07DC.5815Used to store the link to the archived Notes folder. Used for Notes folder archiving.Notes folders are stored as CM folders.
ITEMTYPEDOC or FOLDERDescribes whether the archived item is a document or a folder.Notes documents are stored as CM documents, Notes folders are stored as CM folders.

For an enhanced security during retrieval, CSLD requires two additional technical attributes:

  • CSLDOrigUser
  • CSLDOrigDB

These security attributes have to be configured in both CMOD and CM as separate attributes. During reload into the Domino database, CSLD compares the actual values of the Notes environment (replica ID of the database where the document is restored to, Notes user requesting the retrieval) with the stored security attributes. If they do not match, the restore request is not fulfilled and the job goes into error.


Compression

Compression is of particular importance for e-mail archiving where very high data volumes are off-loaded. E-mails usually contain a lot of line data (or attachments with line data) that can be compressed considerably. The average compression rate that we have seen in messaging environments is about 50 %.

Compression does not only save storage space, it has also a very positive impact on retrieval performance if tape or optical storage is used: The more data on one medium, the fewer media changes are required for retrieval. Since the automatic insertion of the medium into the drive consumes the most time at retrieval, the average retrieval time can be significantly lowered. This allows also keeping the number of parallel drives low.

Let's now have a look at the different backend repositories when using them with CSLD or CSX.

CM has no built-in compression. Despite having activated TSM software compression, we have observed that the CM V7 Object Server suppresses this compression. As a result, the storage space within TSM is as large as within CM. This means that in the case of attachment archiving, the storage space for an attachment in TSM can be even larger than the original space in Domino since Domino stores file attachments in a compressed way. (A possible way to overcome this paradoxical situation is to activate compression on the operating system level, as provided as part of Windows.)

This is no longer the case in CM V8. The CM v8 Resource Manager no longer suppresses the TSM software compression.

CMOD comes with a very efficient built-in compression mechanism. The compression rate is very similar to ZIP and is on average 50 % for e-mails. Since compression is already done on the CMOD Server, there is no need to turn on TSM software compression. The storage size within TSM equals the one within CMOD.

TSM allows a client-based software compression. Unfortunately, though, we have learned that CSLD and CSX suppress the TSM software compression. The reason for not allowing software compression is that CommonStore permits also partial retrieval of archived objects. (For example, you can jump to a certain position within an archived object and retrieve only a subset of that object, as in the case of SAP print lists and archived SAP data). TSM, however, does not support partial retrieval if an item has been archived with TSM software compression. Similar to CM, the archived attachment might be even larger than in Domino!

TSM comes also with a so-called hardware compression feature that is supported in conjunction with selected tape drives. If such a device is available, the compression applies for CM, CMOD and TSM backend repositories.

Choosing the appropriate storage device is part of the CommonStore system design. If you plan to use a tape system with built-in hardware compression, the observed differences in compression with CM, CMOD and TSM are of little relevance. If you plan to use a storage device without hardware compression, however, we recommend running CSLD with CMOD as the backend archive. This configuration needs the least storage space, thanks to the superb built-in compression of CMOD.


Retrieval from the Archive Client

In some CommonStore (CS) projects, you have a requirement to make the items archived from SAP, Domino or Exchange accessible from the "Archive Client" where the Archive Client can be either a thick, Windows-based client application or a thin, Web browser-based client.

Such retrieval is always the result of a prior search; that is, all archived items must be stored together with metadata in the backend repository.

CM comes with comprehensive search features for the metadata. In addition, full-text indexing and search has been tightly integrated into the base product with CM v8. There are now two types of full-text indexes:

  • A full-text index on the attributes (meta data)
  • A full-text index on the archived item itself

The latter can only be generated if the data format (MIME type) of the archived item is supported through the proper file filter. All these search functions can also be used for items that have been archived by CS. As an example, assume that several file attachments with different extensions like DOC, PDF, GIF, JPG, and PPT were archived by CS. The CM v8 full-text engine will update its index periodically and include the content of the DOC, PDF and PPT format. Graphic formats are automatically excluded. As such, you can do a full-text search from the CM v8 Client.

There are some specific points to consider for e-mail and SAP archiving. Outlook e-mails are stored by CSX as MSG files in CM. The filter that ships with CM works on the message body; attached files are not filtered. Notes e-mails are stored by CSLD as CSN files in CM. This format is Notes proprietary; there is no specific filter shipped with CM. Instead, a generic filter is used whose results might be less precise.

An alternative approach for optimizing full-text search is to implement two-step archiving. In the first step, just the attachments are offloaded and full-text indexed. In the second step, the mail body is archived, with an optional format conversion in the case of Domino.

Similar considerations for full-text search and filtering apply to SAP print lists stored as ALF files (a proprietary SAP format) in CM. It is a particular strength of CM v8 that it comes with an open interface for the file format filtering that allows adding a new filter for a specific proprietary data format.

Due to the usually huge volumes (several TBs) in e-mail archiving, the full-text index can also become very large and in the range of several 100 GBs. CM v8 allows you to split up the indexes into several blocks that can be joined at search time. Nonetheless, the pure size of the full-text data causes additional cost in operation (maintenance, backup) and additional investment in hardware (storage, CPU).

The flexibility of Domino allows an alternative approach to make the full-text index more manageable. It is based on creating an abstract of each mail, limited to a maximum size. CSLD is able to store this abstract as an attribute in CM v8. If this attribute is enabled for full-text indexing, the user can do a full-text search on the abstract both from the CM Client and in the Notes Client through the CSLD search functionality. This approach might be an excellent compromise between optimizing the search results and keeping operational costs of the archive at a reasonable level.

Items that have been archived by CS can easily be viewed with CM. The data format (content type, MIME format) that has been set by CS during archiving for each item individually is used to select the correct viewing method. The thick CM Client comes with a built-in viewer that supports plenty of different file formats. The built-in viewer allows doing annotations (like stamping the document, highlight text, etc.). Any annotations, though, are only accessible from the CM Client. If the edited document is later retrieved to Lotus Notes, for instance, the annotations are not visible.

For the other file formats, the CM Client can launch a pre-defined external viewing application. No annotations can be saved in this particular scenario.

Although CMOD comes with strong metadata search and some text search (on the content, but without using a pre-built full-text index) , retrieve and viewing capabilities, it has been felt for a long time that the accessing of items that have their source in CS from the CMOD Client is not a practical approach in a production environment. The reason is related to viewing. Viewing in CMOD is defined in the CMOD application and is completely unrelated to the original data format that is stored by CS in the index field "CONTENT_TYPE" of the CMOD application group. This means that all items in the same application must be displayed with the same viewer, even if the field "CONTENT_TYPE" is set to different values. This particularity of CMOD only matters if different file formats are stored, like in the case of attachment archiving in a mail box. It is irrelevant if entire mails or attachments of an identical type (like in a specific Domino application) are archived.

In the case of e-mail archiving, for instance, the default CS setup links one message type (example: e-mail, calendar entry) to exactly one CMOD configuration triple, consisting of one CMOD folder, one CMOD application group, and one CMOD application. This means that all e-mails go into just one CMOD application group and just one CMOD application, linked to just one viewing application. In the case of attachment archiving, many different file types are archived and "mixed up" in the same application.

A new approach, however, overcomes these configuration issues and allows you to link one message type to many CMOD applications by means of a simple trick (see Appendix A for how to configure CMOD). With this configuration, each document can be stored in the CMOD application that is linked to a certain "CONTENT_TYPE". In each CMOD application, the correct viewing is configured. The CMOD Client comes with a built-in viewer that supports many standard graphic formats and also allows you to start an external viewer application. The built-in viewer allows adding annotations (like highlighting text, etc.). Any annotations, though, are only accessible from the CMOD Client. If the edited document is later retrieved to Lotus Notes, for instance, the annotations are not visible.

There is no retrieval from the Archive Client with TSM as the backend repository. The reasons are that TSM does not allow storing metadata and that there is no Client application for search and retrieve.


Retrieval from Lotus Notes

With CSLD, it is also possible to trigger a search in the archive from the Notes Client. For this purpose, CSLD provides a software tool (setupdb.exe) that creates the Notes forms for the query, the hit list and the result. Both query and result forms include the specific attributes defined in the document mapping.

When triggering such a search, the user can enter attribute values with or without wild cards ("%"). CSLD allows doing a search on the full-text index of a CM v8 attribute (but not on the full-text index of the content) from the Notes Client. Matching documents can always be presented in two ways in Notes:

  • All hits are summarized in a hit list document. From there, each document can be restored individually (or all documents at once).
  • CSLD creates for each hit a separate result document in Notes.

With CM, this search is not restricted only to documents that were originally archived from Notes. A CSLD-enabled Domino database may act as a client to CM to search for documents in unrelated index classes (item types) from other sources, like a scanner. In order to enable such a search for documents from a non-Notes source, you must go through the following steps:

  • Identify the index class (item type) in CM and its associated attributes.
  • Create a Notes form with these attributes.
  • Create one sample document for this new Notes form by entering any data for the attributes.
  • Add a document mapping for this Notes form / item type in CM.
  • Modify archint.ini to connect to the correct item type in CM.
  • Copy the sample document into the CSLD configuration.
  • Run setupdb from a command window. This creates the query, result and hit list form for this item type.
  • Restart the CSLD tasks and archpro to activate the modifications.
  • Test the search.

With CM v7, there is an additional user exit that further secures archive access. By default, CSLD always logs on with the same archive user. If you want to restrict the access to certain documents by switching the archive user depending on the Notes user, CSLD offers a user exit to implement their desired access restrictions. CSLD provides the user with a C function interface that takes the Notes user ID as an input parameter and returns an archive user ID. The user must provide the logic needed to map a Notes user ID to an archive user ID. You can switch on the security user exit in the CommonStore configuration file archint.ini for each archive separately. Simply add the line "ACCESS_CTRL ON" to any archive ID definition block for which you want to enable the user exit. When the user exit is enabled, any request to update an archived document's index information, delete archived documents, or retrieve archived documents (by single retrieval or by query) is performed on behalf of the user ID provided by the user exit. Note that no switching is done for archiving requests.

If users are processing the documents directly from the CM Client, keep in mind that the annotations will not be visible when documents are searched and retrieved from the Notes Client. Only the base document, without annotations, is transferred to Notes.

Another way to limit the access of documents in a certain index class is by using "views" in CM v7, also supported by CSLD. These are subsets of an index class by which the archive administrator can hide attributes and documents of an index class from certain users. The default view of each index class is a view on all attributes and documents of an index class. In CM v7, you can create further views that show just a subset of all documents in an index class for a particular CM user. In order to make only a CM v7 view accessible to CSLD, follow these steps:

  • Create a CM v7 view for the index class and make it accessible for the CSLD user defined in the archive section of archint.ini.
  • Remove the access rights of the standard view of the same index class for the CSLD user.

In this way, the CSLD user will pick the CM v7 view with the lowest view ID that is accessible to that user. Since the standard view is not accessible, the user will see just the newly defined CM v7 view. Note that this functionality is not available with CM v8.

With CMOD, the search capabilities are slightly different. Remember that the CMOD application group requires several technical attributes to store CSLD-specific information (see table 4). CSLD can only search and retrieve documents that are stored in such a CMOD application group. This is of course the case for all documents that came from Notes. If the document was scanned in or imported in a different way into CMOD, it is necessary that all six technical attributes are defined in the CMOD application group. When importing a document in such a CMOD application group, at least the following fields should be set in addition to the application-related attributes:

  • DOC_ID: Without this attribute, CSLD cannot build the hit list or the result document ("CSS7211E: Error while performing query: The system detected an unspecific error condition (CSNMessenger, createHitlist, docID is empty"). Note that you must create a unique DOC_ID yourself. It is recommended to follow the structure described in table 4. Any duplicate DOC_ID causes an error at search and retrieval time.
  • CONTENT_TYPE: Without a content type, CSLD puts the document as attachment without file extension into Notes. This causes problem when viewing or launching it. In addition, the CSLD Web viewing in the Browser might have troubles to identify the correct viewing application.
  • ORIGFILENAME (optional): Without a file name, CSLD creates its own name. Note that the first 19 characters are not part of the file name. Nonetheless, you have to set the first 19 characters (to any value) because an incorrect file name would be retrieved without them.

All other technical attributes must be defined, but can remain empty. Figure 1 shows an example of a document that is imported by StoreOnDemand. This document can be searched and retrieved later on by CSLD from Notes.

Figure 1. Importing a document into CMOD for later search by CSLD
Figure 1. Importing a document into CMOD for later search by CSLD

If users are processing the documents directly from the CMOD Client, keep in mind that the annotations are not visible when documents are searched and retrieved from the Notes Client. Only the base document, without annotations, is transferred to Notes.

If TSM is used as backend repository, no search and retrieve functionality is available since no attribute information can be stored in TSM.


Partial retrieve

With CSSAP, some very large objects are sometimes archived. This is particularly the case for some SAP print lists that can become larger than 1 GB. In order to optimize the retrieval performance, SAP builds indexes that are used to access a subset of a document directly. SAP then requests CSSAP to retrieve a certain data length at a certain data offset in a document. This is also called "partial retrieve". As a result, the user can access the document content much faster.

CM v7, unfortunately, does not provide an API for partial retrieve. This means that each CSSAP request leads to retrieving the entire document. This has improved with CM v8 that allows partial retrieve. If the requested document has been moved from CM v8 to TSM, though, CM v8 will retrieve the entire document from TSM. So the partial retrieve of 50 KB by CSSAP might nonetheless result into moving 1 GB from tape (TSM) to hard disk (CM v8).

Although CMOD has a built-in partial retrieve, it is of no value in the context of CSSAP since CMOD builds its own index for partial retrieve that is different from the one managed by SAP.

TSM is the only backend repository where CSSAP can do a partial retrieve. Since the software compression in TSM would make the offsets invalid, CSSAP suppresses TSM software compression during archiving.

SAP uses partial retrieval also for archived SAP data. SAP data archiving bundles and compresses data from many transactions into one item prior to offloading it by CSSAP. As part of this process, an internal index with the respective data offsets is created. Although the objects themselves are not that large (10 MB), the partial retrieval provides a faster response time when the user wants to access a specific archived transaction.

Due to the technical differences outlined above, it is recommended to use TSM as backend repository for CSSAP when archiving large print lists or doing SAP data archiving.


Deletion

Deletion is a key requirement for records management and document retention in general. This is why CS goes beyond archiving and retrieval. With CSLD, for instance, it is possible to trigger the deletion of items in the archive directly from the Domino environment. The same applies to CSSAP where in some scenarios the deletion can be triggered from the SAPGUI. In the case of CSX, deletion has to be triggered from the archive itself; a deletion directly from Exchange is not possible.

Removal in the archive is a must when an e-mail retention policy is put in place. Let us look at the following example.After 5 years, all e-mails expire and are removed from the Domino server. When the e-mail in the messaging system is deleted, the related archived attachment in the repository should be removed as well. This ensures that both messaging system and archive always remain in synch. CSLD supports the propagation of these deletes requests into the archive. Let us now analyze how each back-end repository processes the deletion request.

With CM, deletion of an archived item is very straightforward. Both the entry in the CM library (the attributes) and the item itself are removed. Most probably, however, the item is no longer managed by CM itself, but has been handed over to TSM. In this case, CM sends also a deletion request to TSM. See the paragraph on deletion in TSM for the details on the removal in TSM.

In CMOD, deletion is handled differently due to the different storage concept where many e-mails are stored in just one storage object. If CSLD requests a deletion in CMOD, only the entry in the CMOD library is removed. The e-mail itself that is located somewhere in the middle of the large storage object remains untouched. Even if million of e-mails are deleted, CMOD will not free up storage space. CMOD, though, provides built-in index and data expiration mechanisms that are independent of CS. We recommend to use this built-in expiration must be used in order to implement a proper e-mail retention policy.

There are two aspects oft deletion in TSM: one deals with removing the entry (or entries) in the TSM database, the other deals with deleting the archived item. The later depends very much on the storage media in use and if the media is on-line or off-line. Consider an optical WORM (write-once-read-many) platter. In this case, no deletion can occur. How about a tape where archived items are written sequentially? Removing one item will not free any space, either.

For this purpose, TSM comes with a built in reclamation functionality. Reclamation is a process of consolidating the remaining data from many sequential access volumes onto fewer new sequential access volumes. Administrators can define a reclamation threshold that is the percentage of reclaimable space that a sequential access media volume must have before the server can reclaim the volume. Space becomes reclaimable when files are expired or are deleted. This reclamation requires that the media is on-line. If storage media has been taken out of the device (example: tape robot), it has become off-line and has to be re-inserted to be able to reclaim space. A deletion request from CS does not lead to an immediate removal of the TSM database entry, either. Instead, the TSM database entry is only marked for deletion. It is the TSM expiration processing that really reduces the size of the TSM database by removing the deleted TSM database entry.


Appendix A: How to set up CMOD so that CSLD archives an attachment in the application corresponding to its content type

In the standard CMOD configuration for CSLD, all attachments are stored in the same CMOD application. This application is specified in the ARCHIVE section of the archint.ini file, the configuration file for archpro. In each CMOD application, only one viewer can be specified. As a consequence, archived attachments cannot be properly viewed from the CMOD Client if they have different content types.

Find below a description how you can set up CMOD in conjunction with CSLD so that attachments of different content type are also stored in different CMOD applications.2

Step 1: Adding an application ID field CMOD application group

In addition to the standard setup for CSLD, you have to define the CONTENT_TYPE field as an application ID field.

  • On the field information tab, select the CONTENT_TYPE field first and then click on the application ID checkbox.
  • Next, add the mappings for the different content types.

If you have not defined any content type mappings in the CSLD configuration database, CSLD will write the extension of the file attachment into the CONTENT_TYPE field (example: doc, DOC, pdf). Note that this entry might be case-sensitive. If content type mappings are defined in the CSLD configuration database (example: file extension "doc" maps to content type "DOCWORD"), CSLD will write the mapped value (example: "DOCWORD") into the CONTENT_TYPE field.

You should add a mapping for each content type. Figure 2 shows the field information panel in the CMOD application group after adding several mappings.

Note that more mappings can be added at a later time through updating the CMOD application group.

Figure 2: Field information panel in the CMOD application group
Figure 2: Field information panel in the CMOD application group

Step 2: Add a CMOD application for each content type

For each content type, you have to add a separate CMOD application. You could have, for example, the following CMOD applications: CSLDMailDOC, CSLDMailPDF, CSLDMailGIF, CSLDMailJPG . . . All of them are linked to just one CMOD application that was defined in step 1.

When adding a CMOD application, you have to define the condition (the application ID), so that a document gets archived into this application. In the example of the Figure 3, only files where the CONTENT_TYPE is GIF are archived into the application CSLDMailGIF.

Figure 3. Defining a CMOD application for a specific content type
Figure 3. Defining a CMOD application for a specific content type

As a next step, you have to go to the "View information" tab to define the corresponding viewer. You can select either the viewers built-in into the CMOD Client or choose an external application. In order to make us of an external viewer, select "User defined" as data type and enter the file extension. This file extension is used on the Client workstation to determine which application is launched. Figure 4 shows an example of the viewer configuration for a Word document. Word documents cannot be viewed within the CMOD Client itself. With the "User defined" and "DOC" as data type setting, Microsoft Word or the Microsoft Word Viewer is automatically launched by the CMOD Client when displaying a document archived in this CMOD application.

Figure 4. Selecting the viewing mode corresponding to the content type
Figure 4. Selecting the viewing mode corresponding to the content type

Concluding remarks

  • Do not modify the settings in the archint.ini file. The CMOD application defined there is ignored.
  • The oldest CMOD application (the one that has been defined first) that is linked to the CMOD application is the default application.
  • If during archiving a file extension is encountered where no mapping is defined yet in the CMOD application group, no error occurs. Instead, the document is archived in the default CMOD application.
  • If during archiving a mapping is encountered, but no linked CMOD application, no error occurs. Instead, the document is archived in the default CMOD application.

Footnotes

1 The most frequent product names are abbreviated in this article as follows:
CS = DB2 CommonStore
CSSAP = DB2 CommonStore for SAP
CSLD = DB2 CommonStore for Lotus Domino
CSX = DB2 CommonStore for Microsoft Exchange
CM = DB2 Content Manager
CMOD = DB2 Content Manager OnDemand
TSM = Tivoli Storage Manger
2 Special thanks to Hanspeter Kaeppeli (IBM Switzerland) and Flemming Floor (IBM Denmark) who made me familiar with the concept of application IDs in CMOD and triggered the new approach described below.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=13539
ArticleTitle=DB2 CommonStore and its Backend Archive Options - Understanding the Technical and Functional Differences
publish-date=11192003