Document storage

This section is organized as follows:
  • Defining document storage management
  • Migrating documents
  • Removing documents
Note: This section describes how to do some of the storage management tasks, but you will need other Content Manager OnDemand information and your archive storage manager product information to do others.

Defining document storage management

The document storage management definitions that you create on the library server determine where and when Content Manager OnDemand stores documents and how it maintains them.

Figure 1 shows how the components of document storage management work together to load documents and migrate them from one storage location to another.
Figure 1. Document storage management
Document storage management

When you load a document into Content Manager OnDemand, you assign it to an application group. The application group is the last document storage management component that you define, because it requires a storage set and storage manager definitions, which you must create first. The load policy identifies the storage set and determines where documents should be loaded. You assign each application group to a storage set. The nodes in the storage set identify the object server on which documents are loaded. You can load documents into cache storage, into archive media, or into both cache storage and archive storage. The cache storage manager maintains documents temporarily on disk. The cache storage manager uses a list of file systems to determine the devices available to store and maintain documents. The archive storage manager maintains documents on optical and tape storage. The archive storage manager uses devices, a device class, a storage pool, a management class, and an archive copy group to determine where to store documents and how long to maintain them. Depending on the load policy, documents may remain where the loading program put them for the number of days that are specified by the migration policy. After a document ages for the specified number of days, the migration process can move it from cache storage to archive storage.

Application groups

The application group is the last component that you must define because it requires a storage set and storage manager definitions. The application group provides a way to group related documents. All documents in the application group are loaded in the storage nodes that are part of the storage set to which the application group is assigned. All documents in the application group migrate according to the rules that are defined for the application group's migration policy.

Use the administrative client to create the application groups that determine the document storage for your documents. You typically define one application group for each set of your documents that have similar storage requirements. For example, documents that must be retained for a specific length of time, in specific storage locations and stored on specific types of media.

Load policy

A load policy contains the rules for loading documents into an application group. It requires one or more storage sets, which you must create first. The load policy determines if documents are loaded into cache storage, archive storage, or both. If the load policy causes documents to be stored only in cache storage, then the migration policy specifies when (or if) documents are copied to archive storage.

You define the load policy when you create the application group. The following properties on the Storage Management page comprise the load policy:
  • Storage Set. Determines where documents will be loaded.
    Note: If you specify Cache Only, then documents can be loaded into cache storage only.
  • Cache Data. Determines if documents will be loaded into cache storage.
    Note: If the storage set is a cache-only storage set, then documents must be loaded into cache storage.
  • Migrate Data from Cache. If you specify When Data is Loaded, then documents will be loaded into archive storage. (Migration is disabled for the application group.)

Migration policy

Migration is the process of copying documents from cache storage to archive storage as controlled by the rules of the application group's migration policy. However, because a document is eligible to be migrated does not mean that it will be migrated. Other factors affect migration, such as the frequency with which you run migration processing (migration cannot take place until you run migration processing).

A migration policy contains the rules for migrating the documents in an application group. Migration requires an archive storage manager and its associated devices, storage pools, and so forth, which you must install and configure before you begin migrating documents.

The migration policy determines how long a document stays in cache storage and, through the storage set, where the document will be copied to next. The client node in the storage set identifies the next location.

You define the migration policy when you create the application group. The following settings on the Storage Management page comprise the migration policy:
  • Storage Set. Determines the next location for documents.
    Note: If you specify Cache Only, then migration is disabled for the application group.
  • Migrate Data From Cache. Determines when documents are eligible to be migrated.
    Note: If you specify No or When Data is Loaded, then migration is disabled for the application group.

Cache storage manager

The cache storage manager is the interface between the object server and the disk storage system. The cache storage manager maintains documents temporarily on disk. Before loading documents, you must identify the file systems that the cache storage manager can use to store and maintain documents. You must define at least one storage set for each object server. Documents migrate from cache storage to archive storage based on the migration policy that is defined for the application group. The cache storage manager can delete documents after they exceed the Cache Document Data for n Days or Life of Data, whichever occurs first. See Removing documents for more information.

Archive storage manager

The archive storage manager is the interface between the object server and an optical or tape storage system. The archive storage manager maintains a backup or long-term copy of documents. Before loading documents, you must configure your archive storage devices and define storage pools, client nodes, and management classes to the archive storage manager. The management class determines how long documents remain in archive storage. The archive storage manager can delete documents after they exceed the Retention Value specified for the management class. See Removing documents for more information.

Migrating documents

Content Manager OnDemand provides automatic migration to copy documents from cache storage to archive storage (for documents that were not loaded to archive storage) and to make documents eligible for deletion to maintain free space in cache storage file systems. Migration helps to ensure that there is sufficient free space in the cache storage file systems, where faster devices can provide the most benefit to your users.
Important: If you use migration to copy documents to archive storage (that is, you do not load documents to archive storage), then you should run migration processing on a regular schedule to make sure that a backup copy of your documents gets created as soon as practically possible. If you defer the migration of documents to archive storage and cache storage were to become corrupted, then you could be left without a backup copy of your documents.

You control automatic migration processing by scheduling the ARSMAINT program to run with the appropriate options. See ARSMAINT for details about the options. See your operating system information for details about how to schedule tasks. You can also manually start migration processing by running the ARSMAINT program from the prompt.

The ARSMAINT program uses an application group's migration policy to control when migration for an application group occurs:
  • If you use Next Cache Migration to control when migration for an application group occurs, then the cache storage manager runs migration processing each time that you start the ARSMAINT program with the appropriate options.
  • If you use After n Days in Cache to control when migration for an application group occurs, then a document must be stored in cache storage for at least the specified number of days before it is eligible to be migrated.

The ARSMAINT program migrates documents from each cache storage file system listed in the cache storage file system file.

The cache storage space that migrated documents occupy can be reclaimed by the cache storage manager after expiration processing completes. After you run migration processing, you should run expiration processing so that the cache storage manager can reclaim the cache storage space occupied by migrated documents.

Figure 2 shows an example of migration processing.
Figure 2. Migration Processing
Migration Processing

In the example, assume that you have never run migration processing on this particular cache storage file system. The box on the left shows the cache storage file system before migration processing begins. It is quite full. (You would have ignored all of the "full cache file system" messages in the Content Manager OnDemand system log.) The box in the middle shows what happens during migration, which is the process of copying documents that are eligible to be migrated to archive storage. The box on the right shows the cache storage file system when migration processing completes. The cache storage file system is still full, however, some two-thirds of the documents are eligible to be removed. You need to run expiration processing to remove documents from the cache storage file system, reclaiming at least some of the space occupied by migrated documents. After expiration processing completes, you will have free storage available in the cache storage file system to load additional documents.

Migration processing in the system log

When you run the ARSMAINT program, it saves messages about its activities in the system log. The types of messages saved in the system log depend on the options that you specify when you run the ARSMAINT program. The number of messages saved in the system log during a migration process depend on the options that you specify for the ARSMAINT program, the number of application groups and segments of data processed, and the number of cache storage file systems defined on the server. You will see one set of messages for each object server on which you run the ARSMAINT program. Table 1 lists the messages you could see in the system log following migration processing.
Table 1. Messages from the ARSMAINT program in the Content Manager OnDemand System Log
Message Number Message Information Explanation
110 Cache Migration
(Date)
(Server)
About to begin cache migration on the specified server. Migration processing uses the specified date (the default is "today").
197 Cache Migration
(ApplGrp)
(ObjName)
(Server)
One of these messages for each storage object migrated to archive storage. Migration copies a storage object if its "After n Days in Cache" period has passed or the application group uses the "Next Cache Migration" migration method.
124 Filesystem Statistics
(filesystem)
(% full)
(server)
One of these messages for each cache file system on the server. Information only to report the percentage of space used in the file system.
Important: In addition to the messages listed in Table 1, you should monitor the system log every day for messages that indicate your cache storage file systems are becoming full. The ARSMAINT program automatically saves a message in the system log when the amount of space used in a cache storage file system exceeds a threshold. The default threshold is 95%. You can specify a different threshold by using the -f parameter when you run the ARSMAINT program.

Removing documents

Documents expire (are eligible for removal) because their cache expiration date or archive retention period has passed. Expired documents can then be removed by the storage managers. The cache storage manager identifies documents for removal by using the application group's expiration policy and high and low expiration thresholds. The archive storage manager marks documents for removal based on the criteria defined in the archive copy group.

Documents expire from cache storage when they reach their cache expiration date. If a document's cache expiration date is less than its Life of Data period, then the document is simply removed from cache storage. Subsequent requests for the document are satisfied by the archive storage manager. When the document reaches its Life of Data period, information about it is removed from the Content Manager OnDemand database (the document can no longer be retrieved). When the document's archive retention period has passed, information about it is removed from the archive storage manager database.

Because a document is eligible to be removed does not mean that it will be deleted from storage. The cache storage manager does not delete expired documents from storage until expiration processing runs. During expiration processing, the archive storage manager deletes information about expired documents from its database. However, the actual documents remain on archive media until such time that the space that they occupy is reclaimed.

Important: Content Manager OnDemand and the archive storage manager delete documents independently of each other. Each uses their own criteria to determine when documents expire and should be removed from the system. Each uses their own utilities to remove documents. However, for final removal of documents from the system, you should specify the same criteria to Content Manager OnDemand and the archive storage manager. The Life of Data, which is used by Content Manager OnDemand, and the Retention Period, which is used by the archive storage manager, should be the same value.

Removing documents from cache storage

The expiration policy determines when documents are eligible for deletion from cache storage. You define the expiration policy when you create the application group. The following properties on the Storage Management page comprise the expiration policy:
  • Cache Document Data For n Days. The length of time in days to keep documents in cache storage. The documents include documents that are already in the cache and any documents that are subsequently loaded. After a document reaches this value, it is eligible to be deleted from cache storage.
  • Life of Data. The length of time in days to maintain documents on the system.
    Note: If you specify Never Expire, then expiration processing is disabled for the application group.
  • Expiration Type. Determines whether one or more documents are eligible to be deleted at a time. For example, the Segment expiration type means that a segment of data (unless you specify otherwise, 10 million documents) can be deleted at a time.
    Note: This is the first time that segment has been mentioned. Up to now, documents and reports have been discussed, which are the data objects that most people associate with the Content Manager OnDemand system. However, administrators who maintain the system work primarily with segments, which represent many documents, and storage objects, which are containers of compressed documents that are maintained by the storage managers.

The cache storage manager does not delete expired documents from cache storage until expiration processing runs. The ARSMAINT program is the expiration utility. You can schedule the ARSMAINT program to run automatically or you can run it manually. You should make sure that the ARSMAINT program runs periodically so that the cache storage manager can reclaim the space that is occupied by expired documents.

You control automatic expiration processing by scheduling the ARSMAINT program to run with the appropriate options. For details about the options, see ARSMAINT. For details about how to schedule tasks, see your operating system information. You can also manually start expiration processing by running the ARSMAINT program from the prompt.

The ARSMAINT program uses expiration thresholds to control when expiration processing begins and ends. The thresholds are set as levels of the space that is used in a cache storage file system, expressed as a percent of total space available in the file system. For each cache storage file system, the ARSMAINT program compares the high threshold with a calculation of the amount of data stored in the file system as a percent of the actual data capacity of the storage volumes that belong to the file system. When the amount of data stored in a cache storage file system exceeds the high threshold, expiration begins. The ARSMAINT program deletes documents from the file system until the amount of space used in the cache storage file system falls below the low expiration threshold. The ARSMAINT program expires documents from each cache storage file system listed in the cache storage file system file. You can use the defaults for the expiration thresholds, or you can change the threshold values to identify the minimum and maximum amount of space for your cache storage file systems.

Figure 3 shows an example of expiration processing.
Figure 3. Expiration Processing
Expiration Processing
This example uses the cache storage file system from the migration example in Figure 2. Some two-thirds of the file system contains documents that are eligible to be removed. When you run the ARSMAINT program, it first determines that the cache storage file system's capacity is equal to or exceeds the high threshold. The ARSMAINT program can then begin deleting documents from the file system, beginning with the oldest documents. After the ARSMAINT program deletes the documents that have the oldest date, it checks the low migration threshold. If the amount of space that is used in the file system is now below the low expiration threshold, then expiration ends. If not, then the ARSMAINT program deletes the next oldest documents, and the process continues. In the example, expiration processing ends before all of the eligible documents have been removed. That's typically OK for two reasons:
  • The expiration process has probably reclaimed enough space to load new documents. (In our example, that is certainly true; some 40 percent of the cache storage file system is now free space.) If not, you need to check your high and low thresholds or add more storage volumes to the cache storage file system.
  • Because a document is eligible to be removed from cache storage does not always mean that it is a good thing to do so. For example, suppose you copy a document to cache storage for 60 days and to archive media for two years. After 60 days, the document is eligible to be removed from cache storage. However, your users continue to access the document on a regular basis for 90, or even 120, days. With the correct set of high and low thresholds, you can probably guarantee that the document will remain in cache storage for another 30 or more days beyond its expiration date, where faster devices can provide the most benefit to your users. (Of course, you could just change the load policy, but that's another story.)

Expiration processing in the system log

When you run the ARSMAINT program, it saves messages about its activities in the system log. The types of messages saved in the system log depend on the options that you specify when you run the ARSMAINT program. The number of messages saved in the system log each time that expiration processing runs depends on the options that you specify for the ARSMAINT program, the number of application groups and segments of data processed, and the number of cache storage file systems defined on the server. You will see one set of messages for each object server on which you run the ARSMAINT program. Table 2 lists the messages you could see in the system log following expiration processing.
Table 2. Messages from the ARSMAINT program in the Content Manager OnDemand System Log
Message Number Message Information Explanation
109 Cache Expiration
(Date)
(Min%)
(Max%)
(Server)
About to begin cache expiration processing on the specified server. Migration processing uses the specified date (the default is "today"). Expiration processing begins on each cache file system that exceeds the Max% (default 80%) and ends when the free space available in the file system falls below the Min% (default 80%).
196 Cache Migration
(ApplGrp)
(ObjName)
(Server)
One of these messages for each storage object deleted from cache storage. A storage object is eligible to be deleted when its "Cache Document Data for n Days" or "Life of Data" period has passed, whichever occurs first.
124 Filesystem Statistics
(filesystem)
(% full)
(server)
One of these messages for each cache storage file system on the server. Information only to report the percentage of space used in the file system.
Important: In addition to the messages listed in Table 2, you should monitor the system log every day for messages that indicate that your cache storage file systems are becoming full. The ARSMAINT program automatically saves a message in the system log when the amount of space used in a cache storage file system exceeds a threshold. The default threshold is 95%. You can specify a different threshold by using the -f parameter when you run the ARSMAINT program.

Removing documents from archive storage

Important: Removing a document from archive storage means that the backup or long-term copy of the document will be deleted from the system. You typically remove documents from archive storage when you no longer have a business or legal requirement to keep them.
A management class contains an archive copy group that specifies the criteria that makes a document eligible for deletion. Documents become eligible for deletion under the following conditions:
  • Administrators delete documents from client nodes
  • An archived document exceeds the time criteria in the archive copy group (how long archived copies are kept)

The archive storage manager does not delete information about expired documents from its database until expiration processing runs. You can run expiration processing either automatically or manually by command. You should make sure that expiration processing runs periodically to allow the archive storage manager to reuse storage pool space that is occupied by expired documents. When expiration processing runs, the archive storage manager deletes documents from its database. The storage space that these documents occupy then becomes reclaimable. See Reclaiming space in storage pools for more information.

You control automatic expiration processing by using the expiration processing interval (EXPINTERVAL) in the server options file (dsmserv.opt). You can set the option by editing the dsmserv.opt file (see the Installation and Configuration Guide for details).

If you use the server option to control when expiration processing occurs, the archive storage manager runs expiration processing each time that you start the server. After that, it runs expiration processing at the interval that you specified with the option, measured from the start time of the server.

You can manually start expiration processing by issuing the EXPIRE INVENTORY command. Expiration processing then deletes information about expired files from the database. You can schedule this command by using the DEFINE SCHEDULE command. If you schedule the EXPIRE INVENTORY command, set the expiration interval to 0 (zero) in the server options so that the archive storage manager does not run expiration processing when you start the server. You can control how long the expiration process runs by using the DURATION parameter with the EXPIRE INVENTORY command.

Reclaiming space in storage pools

Space on a storage pool volume becomes reclaimable as documents expire or are deleted from the volume. For example, documents become obsolete because of aging.

The archive storage manager reclaims the space in storage pools based on a reclamation threshold that you can set for each storage pool. When the percentage of space that can be reclaimed on a volume rises above the reclamation threshold, the archive storage manager reclaims the volume. The archive storage manager rewrites documents on the volume to other volumes in the storage pool, making the original volume available for new documents.

The archive storage manager checks whether reclamation is needed at least once per hour and begins space reclamation for eligible volumes. You can set a reclamation threshold for each storage pool when you define or update the storage pool.

During reclamation, the archive storage manager copies the files to volumes in the same storage pool unless you have specified a reclamation storage pool. Use a reclamation storage pool to allow automatic reclamation for a storage pool with only one drive. See your archive storage manager documentation for details.

After the archive storage manager moves all documents to other volumes, one of the following occurs for the reclaimed volume:
  • If you have explicitly defined the volume to the storage pool, the volume becomes available for reuse by that storage pool
  • If the volume was acquired as a scratch volume, the archive storage manager deletes the volume from its database

Important: See your archive storage manager documentation for more information about reclamation processing, including choosing a reclamation threshold, reclaiming volumes in a storage pool with one drive, reclamation for WORM optical media, reclamation for copy storage pools, and reclamation of off-site volumes.