Document storage
- Defining document storage management
- Migrating documents
- Removing documents
Defining document storage management
The document storage management definitions that you create on the library server determine where and when Content Manager OnDemand stores documents and how it maintains them.
When you load a document into Content Manager OnDemand, you assign it to an application group. The application group is the last document storage management component that you define, because it requires a storage set and storage manager definitions, which you must create first. The load policy identifies the storage set and determines where documents should be loaded. You assign each application group to a storage set. The nodes in the storage set identify the object server on which documents are loaded. You can load documents into cache storage, into archive media, or into both cache storage and archive storage. The cache storage manager maintains documents temporarily on disk. The cache storage manager uses a list of file systems to determine the devices available to store and maintain documents. The archive storage manager maintains documents on optical and tape storage. The archive storage manager uses devices, a device class, a storage pool, a management class, and an archive copy group to determine where to store documents and how long to maintain them. Depending on the load policy, documents may remain where the loading program put them for the number of days that are specified by the migration policy. After a document ages for the specified number of days, the migration process can move it from cache storage to archive storage.
Application groups
The application group is the last component that you must define because it requires a storage set and storage manager definitions. The application group provides a way to group related documents. All documents in the application group are loaded in the storage nodes that are part of the storage set to which the application group is assigned. All documents in the application group migrate according to the rules that are defined for the application group's migration policy.
Use the administrative client to create the application groups that determine the document storage for your documents. You typically define one application group for each set of your documents that have similar storage requirements. For example, documents that must be retained for a specific length of time, in specific storage locations and stored on specific types of media.
Load policy
A load policy contains the rules for loading documents into an application group. It requires one or more storage sets, which you must create first. The load policy determines if documents are loaded into cache storage, archive storage, or both. If the load policy causes documents to be stored only in cache storage, then the migration policy specifies when (or if) documents are copied to archive storage.
- Storage Set. Determines where documents will be loaded. Note: If you specify Cache Only, then documents can be loaded into cache storage only.
- Cache Data. Determines if documents will be loaded into cache
storage. Note: If the storage set is a cache-only storage set, then documents must be loaded into cache storage.
- Migrate Data from Cache. If you specify When Data is Loaded, then documents will be loaded into archive storage. (Migration is disabled for the application group.)
Migration policy
Migration is the process of copying documents from cache storage to archive storage as controlled by the rules of the application group's migration policy. However, because a document is eligible to be migrated does not mean that it will be migrated. Other factors affect migration, such as the frequency with which you run migration processing (migration cannot take place until you run migration processing).
A migration policy contains the rules for migrating the documents in an application group. Migration requires an archive storage manager and its associated devices, storage pools, and so forth, which you must install and configure before you begin migrating documents.
The migration policy determines how long a document stays in cache storage and, through the storage set, where the document will be copied to next. The client node in the storage set identifies the next location.
- Storage Set. Determines the next location for documents. Note: If you specify Cache Only, then migration is disabled for the application group.
- Migrate Data From Cache. Determines when documents are eligible
to be migrated. Note: If you specify No or When Data is Loaded, then migration is disabled for the application group.
Cache storage manager
The cache storage manager is the interface between the object server and the disk storage system. The cache storage manager maintains documents temporarily on disk. Before loading documents, you must identify the file systems that the cache storage manager can use to store and maintain documents. You must define at least one storage set for each object server. Documents migrate from cache storage to archive storage based on the migration policy that is defined for the application group. The cache storage manager can delete documents after they exceed the Cache Document Data for n Days or Life of Data, whichever occurs first. See Removing documents for more information.
Archive storage manager
The archive storage manager is the interface between the object server and an optical or tape storage system. The archive storage manager maintains a backup or long-term copy of documents. Before loading documents, you must configure your archive storage devices and define storage pools, client nodes, and management classes to the archive storage manager. The management class determines how long documents remain in archive storage. The archive storage manager can delete documents after they exceed the Retention Value specified for the management class. See Removing documents for more information.
Migrating documents
You control automatic migration processing by scheduling the ARSMAINT program to run with the appropriate options. See ARSMAINT for details about the options. See your operating system information for details about how to schedule tasks. You can also manually start migration processing by running the ARSMAINT program from the prompt.
- If you use Next Cache Migration to control when migration for an application group occurs, then the cache storage manager runs migration processing each time that you start the ARSMAINT program with the appropriate options.
- If you use After n Days in Cache to control when migration for an application group occurs, then a document must be stored in cache storage for at least the specified number of days before it is eligible to be migrated.
The ARSMAINT program migrates documents from each cache storage file system listed in the cache storage file system file.
The cache storage space that migrated documents occupy can be reclaimed by the cache storage manager after expiration processing completes. After you run migration processing, you should run expiration processing so that the cache storage manager can reclaim the cache storage space occupied by migrated documents.
In the example, assume that you have never run migration processing on this particular cache storage file system. The box on the left shows the cache storage file system before migration processing begins. It is quite full. (You would have ignored all of the "full cache file system" messages in the Content Manager OnDemand system log.) The box in the middle shows what happens during migration, which is the process of copying documents that are eligible to be migrated to archive storage. The box on the right shows the cache storage file system when migration processing completes. The cache storage file system is still full, however, some two-thirds of the documents are eligible to be removed. You need to run expiration processing to remove documents from the cache storage file system, reclaiming at least some of the space occupied by migrated documents. After expiration processing completes, you will have free storage available in the cache storage file system to load additional documents.
Migration processing in the system log
Message Number | Message Information | Explanation |
---|---|---|
110 Cache Migration |
(Date)
(Server) |
About to begin cache migration on the specified server. Migration processing uses the specified date (the default is "today"). |
197 Cache Migration |
(ApplGrp)
(ObjName) (Server) |
One of these messages for each storage object migrated to archive storage. Migration copies a storage object if its "After n Days in Cache" period has passed or the application group uses the "Next Cache Migration" migration method. |
124 Filesystem Statistics |
(filesystem)
(% full) (server) |
One of these messages for each cache file system on the server. Information only to report the percentage of space used in the file system. |
Removing documents
Documents expire (are eligible for removal) because their cache expiration date or archive retention period has passed. Expired documents can then be removed by the storage managers. The cache storage manager identifies documents for removal by using the application group's expiration policy and high and low expiration thresholds. The archive storage manager marks documents for removal based on the criteria defined in the archive copy group.
Documents expire from cache storage when they reach their cache expiration date. If a document's cache expiration date is less than its Life of Data period, then the document is simply removed from cache storage. Subsequent requests for the document are satisfied by the archive storage manager. When the document reaches its Life of Data period, information about it is removed from the Content Manager OnDemand database (the document can no longer be retrieved). When the document's archive retention period has passed, information about it is removed from the archive storage manager database.
Because a document is eligible to be removed does not mean that it will be deleted from storage. The cache storage manager does not delete expired documents from storage until expiration processing runs. During expiration processing, the archive storage manager deletes information about expired documents from its database. However, the actual documents remain on archive media until such time that the space that they occupy is reclaimed.
Removing documents from cache storage
- Cache Document Data For n Days. The length of time in days to keep documents in cache storage. The documents include documents that are already in the cache and any documents that are subsequently loaded. After a document reaches this value, it is eligible to be deleted from cache storage.
- Life of Data. The length of time in days to maintain documents
on the system. Note: If you specify Never Expire, then expiration processing is disabled for the application group.
- Expiration Type. Determines whether one or more documents are
eligible to be deleted at a time. For example, the Segment expiration
type means that a segment of data (unless you specify
otherwise, 10 million documents) can be deleted at a time. Note: This is the first time that segment has been mentioned. Up to now, documents and reports have been discussed, which are the data objects that most people associate with the Content Manager OnDemand system. However, administrators who maintain the system work primarily with segments, which represent many documents, and storage objects, which are containers of compressed documents that are maintained by the storage managers.
The cache storage manager does not delete expired documents from cache storage until expiration processing runs. The ARSMAINT program is the expiration utility. You can schedule the ARSMAINT program to run automatically or you can run it manually. You should make sure that the ARSMAINT program runs periodically so that the cache storage manager can reclaim the space that is occupied by expired documents.
You control automatic expiration processing by scheduling the ARSMAINT program to run with the appropriate options. For details about the options, see ARSMAINT. For details about how to schedule tasks, see your operating system information. You can also manually start expiration processing by running the ARSMAINT program from the prompt.
The ARSMAINT program uses expiration thresholds to control when expiration processing begins and ends. The thresholds are set as levels of the space that is used in a cache storage file system, expressed as a percent of total space available in the file system. For each cache storage file system, the ARSMAINT program compares the high threshold with a calculation of the amount of data stored in the file system as a percent of the actual data capacity of the storage volumes that belong to the file system. When the amount of data stored in a cache storage file system exceeds the high threshold, expiration begins. The ARSMAINT program deletes documents from the file system until the amount of space used in the cache storage file system falls below the low expiration threshold. The ARSMAINT program expires documents from each cache storage file system listed in the cache storage file system file. You can use the defaults for the expiration thresholds, or you can change the threshold values to identify the minimum and maximum amount of space for your cache storage file systems.
- The expiration process has probably reclaimed enough space to load new documents. (In our example, that is certainly true; some 40 percent of the cache storage file system is now free space.) If not, you need to check your high and low thresholds or add more storage volumes to the cache storage file system.
- Because a document is eligible to be removed from cache storage does not always mean that it is a good thing to do so. For example, suppose you copy a document to cache storage for 60 days and to archive media for two years. After 60 days, the document is eligible to be removed from cache storage. However, your users continue to access the document on a regular basis for 90, or even 120, days. With the correct set of high and low thresholds, you can probably guarantee that the document will remain in cache storage for another 30 or more days beyond its expiration date, where faster devices can provide the most benefit to your users. (Of course, you could just change the load policy, but that's another story.)
Expiration processing in the system log
Message Number | Message Information | Explanation |
---|---|---|
109 Cache Expiration |
(Date)
(Min%) (Max%) (Server) |
About to begin cache expiration processing on the specified server. Migration processing uses the specified date (the default is "today"). Expiration processing begins on each cache file system that exceeds the Max% (default 80%) and ends when the free space available in the file system falls below the Min% (default 80%). |
196 Cache Migration |
(ApplGrp)
(ObjName) (Server) |
One of these messages for each storage object deleted from cache storage. A storage object is eligible to be deleted when its "Cache Document Data for n Days" or "Life of Data" period has passed, whichever occurs first. |
124 Filesystem Statistics |
(filesystem)
(% full) (server) |
One of these messages for each cache storage file system on the server. Information only to report the percentage of space used in the file system. |
Removing documents from archive storage
- Administrators delete documents from client nodes
- An archived document exceeds the time criteria in the archive copy group (how long archived copies are kept)
The archive storage manager does not delete information about expired documents from its database until expiration processing runs. You can run expiration processing either automatically or manually by command. You should make sure that expiration processing runs periodically to allow the archive storage manager to reuse storage pool space that is occupied by expired documents. When expiration processing runs, the archive storage manager deletes documents from its database. The storage space that these documents occupy then becomes reclaimable. See Reclaiming space in storage pools for more information.
You control automatic expiration processing by using the expiration processing interval (EXPINTERVAL) in the server options file (dsmserv.opt). You can set the option by editing the dsmserv.opt file (see the Installation and Configuration Guide for details).
If you use the server option to control when expiration processing occurs, the archive storage manager runs expiration processing each time that you start the server. After that, it runs expiration processing at the interval that you specified with the option, measured from the start time of the server.
You can manually start expiration processing by issuing the EXPIRE INVENTORY command. Expiration processing then deletes information about expired files from the database. You can schedule this command by using the DEFINE SCHEDULE command. If you schedule the EXPIRE INVENTORY command, set the expiration interval to 0 (zero) in the server options so that the archive storage manager does not run expiration processing when you start the server. You can control how long the expiration process runs by using the DURATION parameter with the EXPIRE INVENTORY command.
Reclaiming space in storage pools
Space on a storage pool volume becomes reclaimable as documents expire or are deleted from the volume. For example, documents become obsolete because of aging.
The archive storage manager reclaims the space in storage pools based on a reclamation threshold that you can set for each storage pool. When the percentage of space that can be reclaimed on a volume rises above the reclamation threshold, the archive storage manager reclaims the volume. The archive storage manager rewrites documents on the volume to other volumes in the storage pool, making the original volume available for new documents.
The archive storage manager checks whether reclamation is needed at least once per hour and begins space reclamation for eligible volumes. You can set a reclamation threshold for each storage pool when you define or update the storage pool.
During reclamation, the archive storage manager copies the files to volumes in the same storage pool unless you have specified a reclamation storage pool. Use a reclamation storage pool to allow automatic reclamation for a storage pool with only one drive. See your archive storage manager documentation for details.
- If you have explicitly defined the volume to the storage pool, the volume becomes available for reuse by that storage pool
- If the volume was acquired as a scratch volume, the archive storage manager deletes the volume from its database
Important: See your archive storage manager documentation for more information about reclamation processing, including choosing a reclamation threshold, reclaiming volumes in a storage pool with one drive, reclamation for WORM optical media, reclamation for copy storage pools, and reclamation of off-site volumes.