Start of change

Backing up the system

IBM® Content Classification system data, such as knowledge bases, decision plans, configuration data, and feedback data, is stored on the data server component. The system uses only one data server regardless of how many instances of Content Classification are installed. Depending on your system configuration, the amount of data can increase quickly unless you set up processes to periodically back up and remove data from the data server.

About this task

For example, the amount of data can increase quickly in the following cases:
  • Your client application uses functions that use text IDs, such as DecideWithID or CreateText
  • You configure the system to accumulate a large amount of feedback before it is processed, or you configure the system to defer processing of the feedback.
  • You associate a knowledge base with learning data (SARC files).
Content Classification provides two mechanisms for backing up your data:
Back up the entire data server directory
You can configure Content Classification to periodically back up the contents of the data server directory to a backup directory. If there is a problem with the computer on which the data server resides or if the files become corrupted, you can roll back to a previous version by using the backup files. You can also store snapshots of knowledge bases and decision plans in the backup directory. If the current knowledge base or decision plan is corrupted, you can replace it with a backed up copy. For more information, see Backing up the data server.
Back up knowledge bases and decision plans when changes are made
When you select the Back up automatically option in the knowledge base or decision plan properties window in the Management Console, Content Classification automatically creates backup copies of the knowledge base and decision plan when they are changed. Each backup copy is assigned a unique version number. During classification, all decision results and suggestions are associated with the specific decision plan and knowledge base version that is responsible for that decision. Storing previous versions of the decision plan and knowledge base is useful if you need to reproduce results from a previous version after the knowledge base or decision plan was changed.
The backup copies are created in the Classification_Home/dserverdir/VERSIONS directory on the data server. The file name is the name of the knowledge base or decision plan concatenated with the backup version number. Because new versions are created whenever feedback is processed or you make structural changes to the decision plan or knowledge base, many backup copies can be created depending on how you configured the system.
You can import a backup copy of a knowledge base or decision plan into Classification Workbench, analyze and edit if necessary, and then publish the knowledge base or decision plan to the Content Classification server. Before you can import a backup decision plan into Classification Workbench, you must first convert it to DPN format by using the DpFromVersion88 utility. For more information, see the description of the Back up automatically option in Decision plan properties.
Best Practice: Set up a procedure to store the backup copies in a secure location for future reference and periodically delete the old data from the data server.

In addition to backing up the data that is stored on the data server, also back up the decision results. For example, if you use the Classification Center to classify and review your content, you can store a history of all classification decisions by periodically backing up the Classification_Home/ECMTools/logs directory on the Classification Center server.

Important: If you do not set up a backup policy for the Classification Center events log, the history of classification decisions is not saved. Classification Center uses a recycling logs process in which older versions of the events.csv file are deleted when the maximum number of log files is reached. The maximum number of logs is 10 and the maximum file size is 10 MB. A best practice is to back up the log files before the maximum number of log files is reached and concatenate the date of the backup to each log file name.

For the other Classification Center logs, a new log file is created when the maximum log file size, which is 10 MB, is reached. To change these default settings, you can edit the Classification_Home\ECMTools\ClassificationCenter\WebContent\WEB-INF\classes\log4j.properties file.

End of change