Removing index data

About this task

Indexes expire (are eligible for removal) because their life of data period has passed. The indexes, and the documents that they point to, can then be removed from the system. When you remove an index, information about the document to which it points is removed from the database (the document can no longer be retrieved). However, because indexes are eligible to be removed does not mean that they will be deleted from the database. Content Manager OnDemand does not delete expired index data from the database until expiration processing runs.

The application group expiration policy determines when index data is eligible for deletion from the database. You define the expiration policy when you create the application group. The following properties on the Storage Management page comprise the expiration policy:
  • Life of Data and Indexes. The length of time in days to maintain index data and documents on the system. After the index data has been on the system for this number of days, it is eligible to be deleted.
    Note: If you specify Never Expire, then expiration processing is disabled for the application group. (That is, index data will not be removed from the database.)
  • Expiration Type. Determines whether individual indexes or an entire table of index data is deleted at a time. When Content Manager OnDemand deletes index data, it either deletes a row (if the Expiration Type is Document) or drops a table (if the Expiration Type is Segment) or deletes all the rows associated with the load (if the Expiration Type is Load). The amount of index data in a table and the number of reports the data represents is determined by the Database Organization. If the Database Organization is Multiple Loads per Database Table, then by default, a table of index data can hold up to 10 million indexes. These types of tables usually hold the indexes for many reports.

A table of index data is not eligible to be deleted until the latest date in any of its rows reaches the Life of Data and Indexes period. For example, suppose that the Life Of Data and Indexes is set to 365 days, the Expiration Type is set to Segment, and the Database Organization is set to Multiple Loads per Database Table. By default, a table will contain approximately 10 million rows. Further, suppose that a report is loaded into the application group once every month and that each report adds one million rows to the database. Each table can hold the index data from approximately ten reports. Using these assumptions, the data that is loaded into the application group in January will not be eligible to be deleted by expiration processing until November of the following year. If you need to remove the index data for a report as soon as it reaches its Life of Data and Indexes period, then set the Database Organization to Single Load per Database Table and set the Expiration Type to Segment or Load. (And run expiration processing at least once a month.)

Content Manager OnDemand and the archive storage manager delete the documents that expired index data points to independently of each other. Content Manager OnDemand uses the application group's expiration policy to determine when indexes and documents expire and should be removed from the system. The archive storage manager marks documents for removal based on the criteria specified in the archive copy group. However, you should specify the same criteria to Content Manager OnDemand and the archive storage manager. The Life of Data and Indexes, which is used by Content Manager OnDemand, and the Retention Period, which is used by the archive storage manager, should specify the same value.

Content Manager OnDemand does not explicitly delete data stored with Segment or Document expiration from the external storage manager (such as TSM, OAM, or cloud storage), however, the data might still expire in the storage manager based on the TSM, OAM, or cloud storage expiration settings.

The following pictures show an example of expiration processing. For purposes of the example, assume that the Life of Data and Indexes is 365 days, the Database Organization is Single Load per Database Table, and the Expiration Type is Load. Further, assume that one report is loaded into the application group every month, beginning on January 15, 1999, and that expiration processing has never been run on this particular application group.

Figure 1 shows an example of the application group index data before expiration processing begins. The table on the left represents the segment table for the application group. A segment table contains one row for each table of application group data. In the example, a table of application group data contains the index records for one report. A row in the segment table contains the latest date found in the report (or the load date, if the report does not contain a date). For expiration processing, Content Manager OnDemand uses the date from the segment table to determine when to drop a table.
Figure 1. Removing index data. Part 1 of 2
Removing index data. Part 1 of 2
Figure 2 shows an example of the application group index data after expiration processing ends. For purposes of the example, assume that expiration processing ran on March 4, 2000. That date, along with the criteria specified in the expiration policy (specifically, the Life of Data and Indexes is 365 days) causes the ARSMAINT program to drop two tables of application group index data: 1RBA, which has a date of January 15, 1999, and 2RBA, which was has a date of February 15, 1999. Content Manager OnDemand also deleted the rows in the segment table that pointed to the application group tables that were dropped.
Figure 2. Removing index data. Part 2 of 2
Removing index data. Part 2 of 2