Searching SharePoint documents (FileNet P8)

When you use IBM® eDiscovery Manager to search Microsoft SharePoint items that are archived by IBM Content Collector, Version 3.0 or later, refer to these search configuration options for IBM FileNet® P8.

IBM Content Collector creates a default document class with the following name and properties.

Document class symbolic name:
  • ICCSharepointInstance2
Document properties:
  • DocumentTitle
  • ICCCreatedBy
  • ICCCreatedDate
  • ICCExpirationDate
  • ICCFileName
  • ICCFilePath
  • ICCFolderPath
  • ICCExpirationDate
  • ICCLastModifiedDate
  • ICCLibrary
  • ICCModifiedBy
  • ICCSharePointGUID
  • ICCSharePointVersion
  • ICCSite

To use multi-part content archived from SharePoint by IBM Content Collector 3.0 or above in an IBM FileNet P8 repository with IBM eDiscovery Manager, create a new eDiscovery Manager collection using the collection type "Microsoft SharePoint - Content Collector". This collection type provides a good starting point which can be modified and extended to create a richer field definition. Initially, the field definitions have the following fields:

Table 1. Collection fields for the new eDiscovery Manager collection using the collection type "Microsoft SharePoint - Content Collector"
Collection field Content server property Type Text index
EXTERNAL_ID Id String  
CREATED_DATE ICCCreatedDate Date  
MODIFIED_DATE ICCLastModifiedDate Date  
EXPIRATION
_DATE
ICCExpirationDate String  
LIBRARY ICCLibrary String  
SITE ICCSite String  
SHAREPOINT_
VERSION
ICCSharePointVersion String  
FOLDER_PATH ICCFolderPath String  
FILE_NAME ICCFileName String  

Delete the definition for CONTENT and then add the following definitions to get to the full, new field definitions:

Table 2. Definitions that need to be added to get to the full, new field definitions
Collection field Type Text index Description
CONTENT String //icc_content Matches all of the content, including attachments of a SharePoint item.
RAW_CONTENT String $FULL_TEXT$ Matches all of the content and XML tags.
DOCUMENT String //icc_main Matches primary file content only, whether file or HTML rendering.
PRIMARY
_FILE_NAME
String
//icc_main
@name
Matches primary file, file name only.
ATTACHMENT String
//icc_
attachment
Matches attachment content only.
ATTACHMENT
_NAME
String
/icc_
attachment
@name
Matches attachment file name only.

XPath syntax supported in field mappings

The subset of XPath that is supported is defined by CSS XML search engines XPath support. It differs from standard XPath in the following ways:
  • It does not support iteration and ranges in path expressions.
  • It eliminates filter expressions: that is, it allows filtering only in the predicate expression, not in the path expression.
  • It does not allow absolute path names in predicate expressions.
  • It implements only one axis (tag) and allows propagation only in the forward direction.
The following characters are unsupported in the XML search syntax:
  • /*
  • //*
  • /@*
  • //@*

Disregarding of XML namespaces

Namespace prefixes are not retained in the indexing of XML tag and attribute names. You can search XML documents by using namespaces, but namespace prefixes are discarded during indexing and removed from XML search queries.

Numeric values

Predicates that compare attribute values to numbers are supported.

Complete match

The operator = (equal sign) with a string argument in a predicate means that a complete match of all tokens in the string with all tokens in the identified text span is required. The order of the tokens is important.

For more details on the XML search syntax, see the FileNet P8 topic "SQL Syntax Reference" and go to the "XML Search" section.