Searching SharePoint documents (FileNet P8)

When you use IBM® eDiscovery Manager to search Microsoft SharePoint items that are archived by IBM Content Collector, Version 3.0 or later, refer to these search configuration options for IBM FileNet® P8.

IBM Content Collector creates a default document class with the following name and properties.

Document class symbolic name:

ICCSharepointInstance2

Document properties:

DocumentTitle
ICCCreatedBy
ICCCreatedDate
ICCExpirationDate
ICCFileName
ICCFilePath
ICCFolderPath
ICCExpirationDate
ICCLastModifiedDate
ICCLibrary
ICCModifiedBy
ICCSharePointGUID
ICCSharePointVersion
ICCSite

To use multi-part content archived from SharePoint by IBM Content Collector 3.0 or above in an IBM FileNet P8 repository with IBM eDiscovery Manager, create a new eDiscovery Manager collection using the collection type "Microsoft SharePoint - Content Collector". This collection type provides a good starting point which can be modified and extended to create a richer field definition. Initially, the field definitions have the following fields:

Table 1. Collection fields for the new eDiscovery Manager collection using the collection type "Microsoft SharePoint - Content Collector"
Collection field	Content server property	Type
EXTERNAL_ID	Id	String
CREATED_DATE	ICCCreatedDate	Date
MODIFIED_DATE	ICCLastModifiedDate	Date
`EXPIRATION _DATE`	ICCExpirationDate	String
LIBRARY	ICCLibrary	String
SITE	ICCSite	String
`SHAREPOINT_ VERSION`	ICCSharePointVersion	String
FOLDER_PATH	ICCFolderPath	String
FILE_NAME	ICCFileName	String

Delete the definition for CONTENT and then add the following definitions to get to the full, new field definitions:

Table 2. Definitions that need to be added to get to the full, new field definitions
Collection field	Type	Text index	Description
CONTENT	String	//icc_content	Matches all of the content, including attachments of a SharePoint item.
RAW_CONTENT	String	$FULL_TEXT$	Matches all of the content and XML tags.
DOCUMENT	String	//icc_main	Matches primary file content only, whether file or HTML rendering.
`PRIMARY _FILE_NAME`	String	`//icc_main @name`	Matches primary file, file name only.
ATTACHMENT	String	`//icc_ attachment`	Matches attachment content only.
`ATTACHMENT _NAME`	String	`/icc_ attachment @name`	Matches attachment file name only.

XPath syntax supported in field mappings

The subset of XPath that is supported is defined by CSS XML search engines XPath support. It differs from standard XPath in the following ways:

It does not support iteration and ranges in path expressions.
It eliminates filter expressions: that is, it allows filtering only in the predicate expression, not in the path expression.
It does not allow absolute path names in predicate expressions.
It implements only one axis (tag) and allows propagation only in the forward direction.

The following characters are unsupported in the XML search syntax:

/*
//*
/@*
//@*

Disregarding of XML namespaces

Namespace prefixes are not retained in the indexing of XML tag and attribute names. You can search XML documents by using namespaces, but namespace prefixes are discarded during indexing and removed from XML search queries.

Numeric values

Predicates that compare attribute values to numbers are supported.

Complete match

The operator = (equal sign) with a string argument in a predicate means that a complete match of all tokens in the string with all tokens in the identified text span is required. The order of the tokens is important.

For more details on the XML search syntax, see the FileNet P8 topic "SQL Syntax Reference" and go to the "XML Search" section.