Add automatic content classification to your IBM FileNet P8

Step-by-step examples showing how to install, configure, and integrate

Concerned about records compliance and legal discovery effectiveness? Challenged by maintaining consistent and reliable access to content items in various repositories? IBM® Classification Module Version 8.5 can help you to address these issues. This new IBM Enterprise Content Management (ECM) taxonomy management solution understands the full text of content, adapts to the business needs, and automates the classification of content in your IBM FileNet P8 systems. This article provides step-by-step instructions for performing the seamless integration between IBM Classification Module and IBM FileNet P8. It then shows you how to use IBM ECM Classification tools (Content Extractor, Classifier, and Classification Review tool), along with the IBM Classification Module server component and the Classification Workbench, to automate the content classification in the integrated environment.

Share:

Xiaomei Wang (xiaomeiw@ca.ibm.com), ECM Partner Technical Enablement Team, IBM

Xiaomei Wang pictureXiaomei Wang is a technical consultant in Enterprise Content Management (ECM) Partner Technical Enablement team at the IBM Toronto Lab. Her current assignment is assisting business partners with integrating IBM Discovery products into their solutions. She has worked for eight years at IBM, and has extensive knowledge in the DB2 family of products. Xiaomei is a IBM Certified Solution Experts (Content Manager, DB2 for Linux, UNIX, and Windows Database Administration, DB2 for Linux, UNIX, and Windows Application Development, DB2 for Linux, UNIX, and Windows Advanced Technical Expert, and Business Intelligence), and also a Red Hat Certified Technician. You can reach herat xiaomeiw@ca.ibm.com.



Shirley Braley (Shirley_Braley@us.ibm.com), Advisory Software Engineer , IBM Corporation

Author Photo: Shirley BraleyShirley is an Advisory Software Engineer with the Content Management and Discovery team in IBM's Information Management division. She has been with IBM for a total of 15 years, 13 of which were in the Lotus Division, where she specialized in database management tools and applications. Her experience covers a wide range of areas including development of database drivers, adding ODBC connectivity to an early version of Lotus Notes, and creation of application generation tools for IBM LearningSpace. She is currently developing components for IBM Classification Module with a focus on integration with other products in the IBM portfolio. You can reach her at shirley_braley@us.ibm.com.



31 January 2008

Also available in Russian

Introduction

In ECM, taxonomies ensure that content is accurately cataloged and easily accessible. Having consistent and reliable access to unstructured content is the foundation to realizing the business benefits of ECM, and all subsequent content-centric enterprise applications will realize their return on investment (ROI) by leveraging this essential capability.

IBM Classification Module automates the process of categorizing your content by reading and analyzing the full text of each document. It analyzes the entire document, discerns the topics in the text, and then assigns it to a proper category. And it won't be fooled by noises in your content, such as misspellings, abbreviations, or short-hand or technical terms. Moreover, the IBM Classification Module adapts to the unique nature of your business by learning to identify different categories from examples you provide to it. And as you provide feedback on its performance, it adjusts in real time, immediately taking into account corrections you have made. In this way, its accuracy keeps pace with your business, rapidly adjusting to changes as they occur.

The IBM Classification Module integration for IBM FileNet P8 can be deployed on Windows®, AIX®, Solaris, and Linux® environments. The example in this article is specific to the integration on a Microsoft® Windows platform. However, the key concepts and information provided are relevant to any platform.


Integration architecture overview

IBM Classification Module for IBM FileNet P8 is a system that automatically classifies new content for insertion into IBM FileNet P8, or filters out content that doesn't fit the profile of managed content. It can also reclassify and move existing P8 documents to the correct folders or document classes. Unlike other classification systems that are based on rules only, the Classification Module is based on a combination of text analysis and rules, and incorporates real-time learning that adapts to changing business needs and becomes more accurate over time.

Figure 1. Integration architecture overview
Integration architecture overview

As shown in Figure 1, at the core of this solution is the IBM Classification Module product that has been in the market place for many years and has proven to be scalable and reliable in demanding IT environments. It consists of three core components as follows:

  • Classification Application Program Interface (API)

    The Classification API provides C, Java™, .NET, or COM client API libraries to enable rapid development of various client applications that interact directly with the Classification server component.

  • Classification server

    Embedded with natural language processing capabilities, the Classification sever classifies free-form texts by leveraging its Relationship Modeling Engine and a predefined knowledge base (KB) in decision making.

  • Classification Workbench

    Classification Workbench allows you to create and analyze a knowledge base, evaluate the KB's performance using reports and graphical diagnostics, and work with a collection of texts or messages known as a corpus for analysis, training, and learning.

    In addition, the Taxonomy Proposer is a new tool shipped with Classification Workbench 8.5. It can assist users in creating a taxonomy starting from scratch or from a partial one, where it uses custom clustering algorithms to analyze and group similar documents together.

On top of the core product is the newly added integration asset for providing a taxonomy automation solution to IBM FileNet P8. The integration for IBM FileNet P8 includes the following components:

  • Classifier

    Automatically classifies and filters out documents, and sets aside a configured percentage of documents for audit and manual review.

  • Classification Review Tool

    A browser-based application that enables online learning by manually confirming or correcting automatic classification.

  • Content Extractor

    A command-line tool that extracts sample content from the IBM FileNet P8 repository to train a KB and enable automatic classification.


Integration and classification workflow

The example in this article shows you how to first deploy IBM Classification Module integration for IBM FileNet P8, and then perform the content classification in the integrated environment by following the workflow displayed in Figure 2.

Figure 2. Integration and classification workflow
Integration and classification workflow

Phase 1. Install and configure IBM Classification Module integration software stack

This section uses step-by-step screen shots to illustrate how to install and configure IBM Classification Module Integration software stack.

Task 1. Install IBM Classification Module and integration components for IBM FileNet P8

This example covers the installation of IBM Classification Module Integration software stack on the Windows platform. It is assumed that you currently have a working FileNet P8 Version 3.5 server and are able to connect to it.

Before you begin, ensure that:

  • All system requirements are met.
  • Use an installation ID that has administrator privileges on the computer where the Classification Module components are to be installed.
  1. Run the Classification Module Version 8.5 installation wizard, icm85_setup_win32.exe, and click Next.
    Figure 1-1-1. Welcome screen
    Welcome screen
  2. Enter the installation path, and click Next.
    Figure 1-1-2. Directory path
    Directory path
  3. Accept the license agreement, and click Next.
  4. Choose the Custom installation type, and click Next.
    Note: If you select the Basic installation type, the Classification server, client, and workbench components will be installed, but the ECM Tools components will not be.
    Figure 1-1-3. Installation options
    Installation options
  5. Select the installation components, and click Next. In this example, check the options to include Classification Module server, client, workbench, and integration components.
    Figure 1-1-4. Feature selection
    Feature selection
  6. Select the option to install the administration and data server on this computer.

    If you have already installed the IBM Classification Module server on another server, you can connect to it on a remote computer.

    Figure 1-1-5. Administration and data server installation
    Administration and data server installation
  7. Accept the default ports for the administration and data servers.

    You can use other port numbers here, but make sure that those port numbers are not used by any other processes on this computer.

    Figure 1-1-6. Port selection
    Port selection
  8. Install the listener component to handle client requests on this computer.
    Ensure that the port is not used by other processes in your environment.
    Figure 1-1-7. Listener option
    Listener option
  9. In this example, install the Classification Review Tool into New Tomcat.
    There are four options for installing the Classification Review Tool. To learn more about each installation option, refer to the Integration for IBM FileNet P8 User's Guide.
    Figure 1-1-8. Classification Review Tool installation type
    Classification Review Tool installation type
  10. When you wish to install Tomcat 5.0 as part of the IBM Classification Module installation, enter the following information:
    • Directory where Tomcat is installed
    • Home directory for Java 1.4.2
    • Admin user name and password
    • Port of the Tomcat server
    The installation wizard installs the Tomcat server first, and then deploys the Classification Review Tool war file into it.
    Figure 1-1-9. Tomcat servlet container installation information
    Tomcat servlet container installation information
  11. Review the installation summary information, and click Install.
  12. After the IBM Classification Module components have been installed, you are prompted to reboot the machine. You should do this before running the IBM Classification Module server.
  13. After the install completes, you should see a directory structure similar to the one shown in Figure 1-1-10. Check the install_log.txt file for detailed information on the installation.
    Figure 1-1-10. IBM Classification Module directory structure
    IBM Classification Module directory structure

Task 2. Configure IBM FileNet P8 parameters and connectivity

You must perform the following configuration steps to prepare your IBM FileNet P8 object store and enable communications before you classify documents:

  • Add IBM FileNet P8 properties (metadata).
  • Add properties to IBM FileNet P8 document class.
  • Add IBM FileNet P8 folders and document classes.
  • Configure connectivity between the IBM Classification Module and the IBM FileNet P8.

To add IBM FileNet P8 metadata:

  1. Start the IBM FileNet Enterprise Manager.
    For detailed instructions on starting and using the IBM FileNet Enterprise Manager, see the IBM FileNet P8 documentation.
  2. Create a Classification Module AddOn resource by importing the AddOn file (F:\IBM\ICM\ECMTools\icm_prps_addon.xml) that contains Classification Module properties into IBM FileNet P8.
    Figure 1-2-1. Create new AddOn
    Create new AddOn
  3. Install the Classification Module AddOn.
    1. In the IBM FileNet Enterprise Manager, right-click the object store you are working with Classification Module and select All tasks > Install AddOn.
      Figure 1-2-2. AddOn installation
      AddOn installation
    2. Select the newly created AddOn, ICM AddOn, and click Install.
      Figure 1-2-3. AddOn installation cont.
      AddOn installation cont.

To add properties to IBM FileNet P8 document class:

  1. In the IBM FileNet Enterprise Manager, select the object store you are working with, right-click Document Class, and then select Properties > Properties Definitions > Add.
  2. Select all properties beginning with "ICM_" and add them to the base class.
    Figure 1-2-4. Add properties to IBM FileNet P8 document class
    Add properties to IBM FileNet P8 document class

To add IBM FileNet P8 folders and document classes:

Important:The folder and document class names that you define here in IBM FileNet P8 object store must match the names specified in the ECM Classification tools (that is, Content Extractor, Classifier, and Classification Review Tool). If you change a default name, be sure to update it in both locations.

  1. Create the folders in IBM FileNet P8. Use the default names listed below or specify your own.
    Figure 1-2-5. Folders for IBM FileNet P8
    Folders for IBM FileNet P8
  2. Create the document classes in IBM FileNet P8. Use the default names listed below or specify your own.
    The document classes are required only if you plan to classify documents by document class.
    Figure 1-2-6. Document classes in IBM FileNet P8
    Document classes in IBM FileNet P8

To configure connectivity between IBM Classification Module and IBM FileNet P8:

  • Configuring connectivity to IBM FileNet P8 depends on factors such as the IBM FileNet P8 version, Java Web application, and client-server communication. Configuring connectivity might be as simple as specifying the IBM FileNet P8 IP address in the WcmApiConfig.properties file in F:\IBM\ICM\ECMTools\conf. Here is an example of what this file would look like. Substitute the IP address and port number of your FileNet P8 server in the first three lines. Consult your FileNet P8 administrator if you are unsure what the correct URL is.
    RemoteServerUrl =http://9.148.198.23:8008/ApplicationEngine/xcmisasoap.dll
    RemoteServerUploadUrl =http://9.148.198.23:8008/ApplicationEngine/doccontent.dll
    RemoteServerDownloadUrl =http://9.148.198.23:8008/ApplicationEngine/doccontent.dll
    
    CredentialsProtection =Clear
    CryptoKeyFile =C:\\Program Files\\FileNet\\Authentication\\CryptoKeyFile.properties
    
    CredentialsProtection/UserToken = Symmetric
    CryptoKeyFile/UserToken =C:\\Program Files\\FileNet\\Authentication\\
         UTCryptoKeyFile.properties

    Configuring connectivity might also involve additional configuration files and communication library files (JARs).

Phase 2. Train the IBM Classification Module System with IBM FileNet P8 content

This section uses step-by-step screen shots to illustrate how to train an IBM Classification Module system, and have it learn how to classify based on the sample content extracted from IBM FileNet P8 through three tasks below:

  • Extract content from FileNet P8
  • Create and analyze a knowledge base
  • Configure IBM Classification Module: KB and dictionary

Task 1. Extract content from FileNet P8

You use the Content Extractor tool to extract a set of sample documents from FileNet P8 object store.

Before you begin, ensure that IBM FileNet P8 is running.

  1. Configure the Content Extractor by editing the Extractor.properties file. You must set these properties before you run the Content Extractor.
  2. Locate and edit the Extractor.properties file in the F:\IBM\ICM\ECMTools\conf directory.
    You can use the default values for most of the properties. But you might want to change the following properties:
    • logFile: The Content Extractor log file
      logFile = logs/icm_p8_extractor.log
    • XmlDirectory: The directory where XML output by the Content Extractor is written. This directory must exist and be empty prior to the Content Extractor run.
      XmlDirectory = extractorOutput

    Which documents to extract?
    The criteria below are AND-ed together, that is, path AND doc class AND Date. If you want the OR logic, do separate extraction runs.

    • Path_1: From a specific FileNet P8 folder
      Path_1 = QA/Demo
    • IgnorePath_1: Exclude a specific FileNet P8 folder
      IgnorePath_1 = QA/Demo/Accounting
    • With_1: By document property
      With_1 = DocumentClass=Technote
    • Without_1: Excluded by document property
      Without_1 = DocumentClass=TopSecret
    • Date: Documents that were modified since a date
      Date = 13-Jul-2007

    How many documents to extract?


    • FolderFraction: Fraction of eligible documents to extract from each folder
      FolderFraction = 0.2
    • FileMax: Maximum number of bytes to extract from a document
      FileMax = 600000
  3. From a command prompt, go to the ECMTools directory in the IBM Classification Module installation path.
    cd F:\IBM\ICM\ECMTools
  4. Run the Content Extractor with the following command:
    Extractor -u Administrator -p ihpdep -f conf\extractor.properties -v

Task 2. Create and analyze a KB

You use IBM Classification Workbench to create and analyze a KB using content extracted from the FileNet P8 object store.

  1. Launch IBM Classification Workbench by navigating to Start > Programs > IBM Classification Module 8.5 > Classification Workbench.
  2. On the Project Explorer that appears, click New.
    Figure 2-2-1. Project Explorer
    Project Explorer
  3. On the New Project screen, enter a project name, FocusPlusProject, and click Next.
    Figure 2-2-2. Add new project name
    Add new project name
  4. On the Import Corpus screen, under External formats, select XML (eXtensible Markup Language), and click Next.
    Figure 2-2-3. Import a corpus
    Import corpus
  5. Browse to the folder containing the XML files generated by the Content Extractor. The folder path is what is defined through the Content Extractor property, XmlDirectory. Click Next.
    Figure 2-2-4. Import corpus location
    Import corpus location
  6. Select the check box Scan XML data files for corpus fields before importing the corpus., and click Finish.
    Figure 2-2-5. XML import method
    XML import method
  7. After importing the XML files, from the menu, select View > View Project Details to display the project detail panel on the right side if it hasn't displayed.
    Figure 2-2-6. Display the project detail panel
    Display the project detail panel
  8. On the Project Details Fields tab, right click the Document Title field, and select Edit Field. In the Corpus Field Properties window, set Type to string and NLP Usage to Plain Text, and click OK.
    Figure 2-2-7. Edit document title properties
    Edit document title properties
  9. Perform the same operations as above to edit the ICM_CONTENT field by setting its Type to string and NLP Usage to Plain Text.
    Figure 2-2-8. Edit ICM_CONTENT properties
    Edit ICM_CONTENT properties
  10. On the Project Details Fields tab, right click on the ICM_folders field, and select Use as Categories. Click OK in the pop-up message window.
    Figure 2-2-9. Define ICM_folders field
    Define ICM_folders field
  11. On the Project Details Categories tab, review the list of newly created categories.
    Figure 2-2-10. List of newly created categories
    List of newly created categories
  12. Create and analyze a new KB:
    1. Click KB Wizard on the toolbar to launch the Create, Analyze, and Learn Wizard.
    2. Click Next to display the first options window.
    3. Select Create and analyze KB using active view, and choose Create using even, analyze using odd.
    4. Click Next to display the next options window, and keep the default values.
    5. Click Next to display the Match Fields window. By default, the Add Match Field box is checked and the Number of matches to display field is set to 5. Keep the default values in this example.
    6. Click Finish to continue. The Status Information screen lets you view the Create and Analyze processes as they progress.
    7. When processing is complete, click Close.
    8. View the results of the knowledge base analysis in the workspace.
    9. Note the matching values. There are five matching columns that show potential matching categories for every other item.
    10. Note that the Classification Workbench has analyzed only the odd-numbered items.
  13. (Optional) Click Reports on the toolbar to open the View Reports window. Check the following reports and graphs:
    • Cumulative success
    • KB data sheet
    • Cumulative success graph
    • Total precision vs. recall graph

    Click OK to run the reports. View the reports that you have generated. These reports provide a summary of information about the categories defined in your KB. For more information on how to analyze and fine tune a KB, refer to Classification Workbench User's Guide.

Task 3. Configure IBM Classification Module: KB and dictionary

You register the newly created KB with the classification engine through the IBM Classification Manager.

  1. Launch the IBM Classification Manager by navigating to Start > Programs > IBM Classification Module 8.5 > Classification Manager.
  2. Enter the server listener URL, same as what you defined during the installation.
    Figure 2-3-1. Server listener URL
    Server listener URL
  3. In the console tree, select Knowledge Bases, and then click Add on the toolbar to add a new KB.
    Figure 2-3-2. Add a new knowledge base
  4. In the Add Knowledge Base window, define the fields below, and click OK:
    • Knowledge base name: FocusPlus
    • Select Import statistics from file
    • Browse to the FocusPlusProject.kb file created by IBM Classification Workbench. The kb file is located in the project directory F:\IBM\ICM\Workbench\Classification Workbench\Projects_Unicode\FocusPlusProject.
    • Select Access file from server.
    • Select English as the supported language.

    Figure 2-3-3. Define fields to create knowledge base
    Define fields to create knowledge base
  5. The IBM Classification Module receives texts as a series of fields (also known as Name Value Pairs). The dictionary defines the data type and method of language processing performed on each field. The IBM Classification Module uses dictionary entries information to analyze and classify documents.

    In the console tree, select Dictionary.

    In IBM Classification Module 8.5, the Classification Module Dictionary provides three predefined fields: Body, FileName, and Title, which are sufficient in this example. In case you need to add additional dictionary entries for other deployments, click Add on the toolbar to add a new dictionary entry.

    Figure 2-3-4. Classification Module Dictionary
    Classification Module Dictionary
  6. If new dictionary entries are added, the KB should be restarted to recognize the changes. In the console tree, select Knowledge Bases, and right-click on the newly added KB to restart the KB.
    Figure 2-3-5. Restart knowledge base
    Restart knowledge base

Phase 3. Classify new content for IBM FileNet P8

This section illustrates how to use the Classifier to scan new documents in a specified directory and automatically:

  • Classify documents into right folders in a FileNet P8 object store
  • Reject documents that do not belong to a FileNet P8 object store
  • Set aside a configured percentage of documents for manual review

Before you begin, ensure that:

  • The IBM Classification Module server and KB instance are running. You can verify it through the Classification Manager.
  • IBM FileNet P8 is running.
  1. Launch the Classifier tool by navigating to Start > Programs > IBM Classification Module 8.5 > Classifier, and log on with your IBM FileNet P8 user name and password. This FileNet P8 user ID must have read and write permission on all folders in FileNet p8 that are required for classification.
  2. On the General tab, set the following global properties:
    • Global FileNet P8 settings:
      • FileNet P8 Object Store for all documents: QA
      • Folder containing documents to be classified: ICM_Classifier_Input
      • Folder to place documents after they are classified: ReviewToolInput
        Figure 3-1. Define global FileNet P8 settings
        Define global FileNet P8 settings
    • Classification type:
      Select the check box Folder.
      Figure 3-2. Define classification type
      Define clasification type
    • Content to classify:
      Select The file system with the directory F:\Test, where all the new documents reside.
      Figure 3-3. Define the content to classify
      Define the content to classify
    • IBM Classification Module settings:
      Set the IBM Classification Module listener URL http://localhost:18087, which is the same as what you defined during the installation.
      Figure 3-4. Define the IBM Classification Module settings
      Define the IBM Classification Module settings
  3. On the Folder Classification tab, go the Runtime Settings tab and set the following properties:
    • Scoring:
      • Classification Module knowledge base to use for scoring: FocusPlus
      • Maximum number of folders to suggest: 5
      • Don't suggest folders with scores less than: 0

      Figure 3-5. Define scoring properties
      Define scoring properties
    • Default folders:
      • When a document doesn't match any folders, put it in: ICM_No_Matches
      • When an error occurs while classifying a document, put it in: ICM_Classification_Error
      Figure 3-6. Define default folder properties
      Define default folder properties
  4. On the Folder Classification tab, go the Document Properties tab and set the following properties:
Table 1. File system document properties
Document PropertyClassification Module Field
Document ContentBody
Document FilenameFileName
Document TitleTitle
Figure 3-7. Define the document properties
Define the document properties
  1. On the Folder Classification tab, go the Auto-move tab and set the following properties:
    • Select the check box Allow Classifier to automatically move documents during classification.
    • Also select the check boxes Enable auto-classify and Enable auto-reject.
      Figure 3-8. Define the auto-move properties
      Define the auto-move properties
    • Auto-move thresholds:
      • Select Use the same thresholds for all folders when determining whether to accept or reject documents.
      • Automatically classify documents when the score for the top folder is above: 60
      • Automatically reject documents when the score for the top folder is below: 20
        Figure 3-9. Define auto-move thresholds
        Define auto-move thresholds
    • Audit percentage:
      • Percentage of auto-classified documents to be sent to the Classification Review Tool for auditing: 50
      • Percentage of auto-rejected documents to be sent to the Classification Review Tool for auditing: 50
        Note: You might want to set high audit percentages initially, then lower audit percentages when you are more confident with the system's classification ability.
        Figure 3-10. Define audit percentage
        Define audit percentage
    • Auto-reject folder:
      When a document is auto-rejected, put it in this folder: ICM_Auto_Rejected
      Figure 3-11. Define the auto-reject folder
  2. Click Save to save changes to the Classifier property file.
  3. Click Start to classify documents. The Classifier scans all the files in the input directory and does the following:
    • Automatically classifies the documents to the right folder in FileNet P8
    • Automatically rejects the document since it does not belong in FileNet P8
    • Moves a configured percentage of documents into the Classification Review Tool folder for manual review
      Figure 3-12. Action buttons description
      Action buttons description
  4. The monitoring area shows information such as how many documents were auto-classified or auto-rejected and destination folders.
    Figure 3-13. Monitoring area
    Monitoring area

Phase 4. Review content classification

This section illustrates how to use the Classification Review Tool to manually confirm or correct automatic classification.

Before you begin, ensure that:

  • The IBM Classification Module server and KB instance are running. You can verify it through the Classification Manager.
  • The Classification Review Tool Web application server is up and running. In this example, start the Apache Tomcat Web application server.
  • IBM FileNet P8 is running.
  1. Configure the Classification Review Tool by editing the ReviewTool.properties file either in the F:\Apache\Tomcat5.0\webapps\ReviewTool\WEB-INF\classes\com\ibm\ICMP8\resources directory, or in the folder specified in the ReviewTool.xml file in the F:\Apache\Tomcat5.0\conf\Catalina\localhost directory. You must properly set these properties before you run the Classification Review Tool. The location of the properties file depends on whether you are running the Classification Review Tool with an Apache Tomcat installation or with IBM WebSphere Application Server. You can use the default values for most of the properties. In case you need to update any property, detailed property description can be found in the Integration for IBM FileNet P8 User's Guide. In particular, the Classification Review Tool uses some settings from the Classifier property file, such as FileNet P8 folder locations.
    reviewTool.classifier.propertyfile=${reviewTool.baseDir}/
        ECMTools/conf/Classifier.properties
  2. Launch the Classification Review Tool by navigating to Start > Programs > IBM Classification Module 8.5 > Classification Review Tool. In any environment, you can start the Classification Review Tool by entering the following Web address in a browser:

    http://Web_application_server_name or IP_address:port_number/ReviewTool/index.jsp

  3. On the Sign in window, enter your IBM FileNet P8 user name and password. This FileNet P8 user ID must have read and write permission on all relevant FileNet P8 folders.

Single document view

  1. Upon login into the Classification Review Tool, you are in the single document view by default. The documents in this view are taken from the Classifier output folder ReviewToolInput. You can navigate and view documents in the review queue one by one by clicking Next or Previous.
    Figure 4-1. Classification Review Tool — Single document view
    Classification Review Tool -- Single document view
  2. After navigation, click Start Over to set the view back to the first document.
  3. For each document, the system tells why the document is in the review queue. Click Show on the Additional Document Information bar.
    Figure 4-2. Document information
    Document information
  4. Click the document type icon or the document name to review the document content.
    Figure 4-3. View document content
    View document content
  5. Review the folders in the Suggested and Selected lists for the given document. The Suggested list displays the top classification folders that the IBM Classification Module determined to be most appropriate for the document. The Selected list displays the suggested folder with the highest relevancy score by default.
    Figure 4-4. Suggested and Selected folder lists
    Suggested and Selected folder lists
  6. Depending on your evaluation of the document and the system's proposed classification, you can accept the classification recommendation(by clicking Apply Classification), delete or reclassify the document (by clicking Delete Document or Send to Classifier), or send the document to the Taxonomy Proposer for further process (by clicking Mark as Unknown). In this example, click the Apply Classification link to move the document to the selected folder in FileNet P8 object store. This action in turn provides feedback to the system that allows the system to provide more accurate classification in the future, and also keep up with the evolution of your taxonomy definition.
  7. After the Apply Classification operation, the top of the browser displays the second document in the review queue, and the number of documents in the review queue has decreased from 78 to 77 along with the message, "The document was successfully classified to folder(s) HR."
    Figure 4-5. Successful document classification
    Successful document classification

Document list view

  1. If you would like to take actions in bulk, you can use the document list view to process multiple documents simultaneously.
  2. Click View: Document list, and switch to the document list view.
  3. Review the folder that appears in the Move to column. This is how the IBM Classification Module intends to classify the document.
  4. Mark check boxes to select documents for which a bulk operation is applied. When selecting the check box in the table title row, all documents on the page are selected.
    Figure 4-6. Classification Review Tool -- Document list view
    Classification Review Tool -- Document list view
  5. Once multiple documents have been selected, you can perform operations similar to those in the single document view except that there is no Send to Classifier action in document list view.

Add new documents

  1. You can add new documents one at a time to the IBM FileNet P8 repository, and accept or modify the classification that the IBM Classification Module suggests for the document.
  2. In the sidebar, click Add New Document.
  3. Browse to the document you'd like to add, and click Add.
    Figure 4-7. Add document
    Add document
  4. The document information is displayed, and the document is classified.
    Figure 4-8. Document successfully added
  5. You can process this new document as you would process any document in the review queue in the single document view.

Summary

This article introduced IBM Classification Module and IBM FileNet P8 integration steps in a Windows environment. It first reviewed the integration architecture and workflow. It then used step-by-step screen shots to illustrate how to use the IBM ECM Classification tools along with the IBM Classification Module server component and the Classification Workbench to automate the content classification in the integrated environment.

In summary, while ingesting new content into IBM FileNet P8, a typical workflow to automate content classification in the integrated environment is:

  • Train the IBM Classification Module system with IBM FileNet P8 content by using the Content Extractor to extract sample content from a FileNet P8 repository, creating and analyzing a knowledge base in the Classification Workbench, and registering the newly created KB with the classification engine through the Classification Manager.

    You can rapidly train the IBM Classification Module based on the categories already created in your IBM FileNet P8 repository, as addressed above. But if you don't have a set of categories ready to go, the IBM Classification Module's taxonomy proposer can divide an existing set of content into logical groupings, recommending a new set of categories and corresponding names. You can learn more about the Taxonomy Proposer by referring to the Integration for IBM FileNet P8 User's Guide.

  • Classify new content for IBM FileNet P8 by using the Classifier tool to classify documents into right folders or document classes in FileNet P8, reject documents with low relevancy scores, and set aside a configured percentage of documents for manual review or audit.
  • Review content classification to ensure the classification accuracy by using the Classification Review Tool to manually confirm or correct automatic classification.

Resources

Learn

Get products and technologies

  • Build your next development project with IBM trial software, available for download directly from developerWorks.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=284966
ArticleTitle=Add automatic content classification to your IBM FileNet P8
publish-date=01312008