Job data purging

Job data includes files in the job directory and the related information in the internal database. To avoid data overflowing the file system and database, job data must be cleaned up after a job is completed (Done or Exited). IBM Spectrum LSF Application Center supports automatic job data purging. You can define how long to keep job data and job information for Done and Exited jobs before the data is automatically purged.

When data is purged

Data is purged daily according to the time defined with the parameter AutoPurgeTime in $GUI_CONFDIR/pmc.conf. The default is daily at 3:00 am.

To view the scheduled time for automatic purging of job data, see the Purge Time column on the Workload page.
Note: If you don't see the Purge Time column, add it from the settings menu on the Workload page.

How data purging works

IBM Spectrum LSF Application Center runs the purger task daily, according to the configurable parameter AutoPurgeTime in the $GUI_CONFDIR/pmc.conf file.

The purger task checks all job data to see whether it is expired. If it is expired, the purger removes the job data.

Note that custom job directories created on server hosts through the cd $NEW_DIR or -cwd $NEWDIR command cannot be manually deleted by a user and are not automatically deleted through data purging unless the parameter DELETE_REMOTE_DIR=Y in the $GUI_CONFDIR/pmc.conf file.

Data expiry time is calculated as:

job data expiry time = job completion time + FINISH_JOB_TIME_TO_LIVE

The parameter FINISH_JOB_TIME_TO_LIVE in the $GUI_CONFDIR/pmc.conf file is a system-wide parameter for all applications. It defines how many days job data and job information is kept in the system for all Done and Exited jobs before it is automatically purged. The default is 14 days.

This parameter takes effect only if an application does not have a defined time-to-live (repository ttl parameter) in its XML application definition file (published applications: $GUI_CONFDIR/application/published/application_name/application_name.xml, unpublished applications: $GUI_CONFDIR/application/draft/application_name/application_name.xml), or a job is submitted outside of IBM Spectrum LSF Application Center (such as with the LSF command line).

If there is a defined repository ttl parameter for an application, the application-specific setting overrides the FINISH_JOB_TIME_TO_LIVE system setting.

If there is a defined repository ttl parameter for an application, data expiry time is calculated as follows for that application:

job data expiry time = job completion time + time to live (ttl)

Data purging and job groups

When the system automatically purges job data, it checks if the job belongs to a job group. If some jobs in the job group do not have the status Done or Exited, the system does not purge any job data. When all jobs in the group have the status Done or Exited, data purging occurs in the same way as for other job data.

How parameter settings affect what users see

The settings of the parameters FINISH_JOB_TIME_TO_LIVE and repository ttl for an application affect what users see on the Workload page and the Data page. For example, if you define that job data and job information is to be kept for 10 days, users will be able to see job information and job data for all pending and running jobs, and also for jobs that have ended in the past 10 days. Job data and information for jobs that have an end time earlier than the past 10 days will be purged and will not be visible.