Question & Answer
Question
How do I select which archived documents to assign retention schedules to? What is the correct syntax to use in the QueryString configuration option? Are there any constraints on how to construct a query string?
Answer
The Set Retention tool is the IBM Content Collector Retention Manager tool to use to apply retention schedules to archived documents that were not assigned retention schedules during archiving.
To specify which documents in the repository you want to assign a retention schedule to you must define the Set Retention configuration option called QueryString. The QueryString option must be defined in the properties file of the Set Retention tool. Depending on your target repository system, you must modify either the afu_SetRetention_Config_CM8.properties file for an IBM Content Manager repository or the afu_SetRetention_Config_P8.properties file for an IBM FileNet P8 repository.
In both files, the query string must conform to the query language syntax supported by the respective repository.
For queries to an IBM Content Manager repository, the query string must be an XQuery Path Expression (XQPE). For more information on XQPE including useful query examples, see http://pic.dhe.ibm.com/infocenter/cmgmt/v8r5m0/index.jsp?topic=%2Fcom.ibm.cmgmt.doc%2Fic-homepage.html and navigate to Developing > Programming content management applications > Searching for data > Example searches by using the query language.
For queries to an IBM FileNet P8 repository, the query string must conform with the FileNet P8 Content Engine query syntax that is used by the Content Engine Java API. This syntax generally conforms to the SQL-92 standard and is essentially an SQL SELECT statement. For more information on the P8 query syntax and examples, see http://pic.dhe.ibm.com/infocenter/p8docs/v5r2m0/index.jsp and navigate to Developing FileNet P8 applications > Content Engine Development > Content Engine Java and .Net Developer's Guide > Reference > SQL Syntax Reference.
IBM Content Manager query string details and samples
The QueryString value for IBM Content Manager contains an XQuery Path Expression that starts with the name of the item type data model from which you want to select documents. You specify the document selection by defining conditions on document attributes. You reference these attributes by using the @-operator (@). The item type data model and the attributes for the item type are different for email and file system documents. For email documents, the default IBM Content Collector item types are ICCEmailCmpLD (Lotus Domino), ICCEmailCmpEX (Microsoft Exchange), and ICCSMTPCmp (SMTP/MIME). For file system documents, the default Content Collector item type is called ICCFilesystem. The general syntax is:
/item_type[condition clause]
The following QueryString examples use the default Content Collector item type for Domino email documents called ICCEmailCmpLD.
To find email documents in the repository where the value of the attribute ICCMailDate is less than or equal to the 7th of August 2013, specify:
QueryString = /ICCEmailCmpLD[@ICCMailDate <= "2013-08-07"]
The same query for file system documents using the creation date attribute is specified in the following way:
QueryString = /ICCFilesystem[@ICCCreatedDate <= "2013-08-07"]
The following queries are more complex examples for querying email documents in an IBM Content Manager repository:
- To find email documents that were archived between the 10th of November 2013 and the 1st of December 2013 search against the attribute ICCMailDate:
/ICCEmailCmpLD[@ICCMailDate BETWEEN "2013-11-10-08.00.00.000000" AND "2013-12-01-07.59.59.000000"]
- To find email documents that contain the keyword "test" in the email content, define a full-text search query on the email content:
/ICCEmailCmpLD[(contains-text(@TIEREF, "EXPANSION LIMIT 20000 SECTIONS('content') ((('test')))")=1)]
- To find email documents that contain the keyword "test" in the subject, define a full-text search query on the email subject:
/ICCEmailCmpLD[(contains-text(@TIEREF, "EXPANSION LIMIT 20000 SECTIONS('subject') ((('test')))")=1)]
- To select email documents based on the sender or recipients of email documents, define a full-text search query against the sender or recipients:
/ICCEmailCmpLD[(contains-text(@TIEREF, "EXPANSION LIMIT 20000 SECTIONS('from') ((('phillip.allen@enron.com')))")=1)]
/ICCEmailCmpLD[(contains-text(@TIEREF, "EXPANSION LIMIT 20000 SECTIONS('recipient_addresses') ((('phillip.allen@enron.com')))")=1)]
- To find email instances based on the mailbox from which it is archived, define a attribute query against the attribute ICCMailboxID.
- /ICCEmailCmpLD[AFUEChild[@ICCMailboxID="88257B02:00251903"]]
Note: There is an important constraint when running the Set Retention tool that you should be aware of when you plan to update archived email documents based on an email instance repository query (-ei). If you want to re-apply schedules on archived email that have duplicates, that means, on documents that have more than one email instance, but the schedule does not affect all email instances, you must not use the -update -ei arguments when you run the Set Retention tool. In this case, you should obtain the item IDs of those email instances that you want to work on and use -processid -ei to process only those instances affected by the schedule.
The following queries are more complex examples for querying file system documents in an IBM Content Manager repository:
- To find file system documents that were created between the 10th of November 2013 and the 1st of December 2013, define a query that runs against the created date of a file:
/ICCFilesystem[@ICCCreatedDate BETWEEN "2013-11-10-08.00.00.000000" AND "2013-12-01-07.59.59.000000"]
- To find those file system documents in one directory that contain the term "test", define a file path query:
/ICCFilesystem[(contains-text(@TIEREF, "EXPANSION LIMIT 20000 SECTIONS('generic_content') ((('test')))")=1)]
IBM FileNet P8 query details and string samples
The QueryString value for IBM FileNet P8 is basically an SQL SELECT statement on the document class and you specify the document selection via the WHERE clause. Select all columns because the Set Retention tool always applies filters before executing the query that restrict the query to only those columns that are actually needed. Document classes and attributes are different for email and file system documents. For email documents (DEIs), the default Content Collector document class is called ICCMail3. For email instances (EIs), the default Content Collector document class is called ICCMailInstance3, which is the email instance class of ICCMail3. For file system documents, the default Content Collector document class is called ICCFileInstance2. If you use a different document class name for file system documents, you must use the name that you specified.
In most cases you can select the archived documents based on attribute values. The following examples show the syntax of the query strings. Select the email sample query depending on which set retention data model component argument (-dei, -ei, or -fi) is used when you run the Set Retention tool.
For email (distinct email instances (DEIs)):
SELECT DEI.* FROM <email class> DEI WHERE <condition clause>
For email (email instances (EIs)):
SELECT EI.* FROM <email instance class> EI WHERE <condition clause>
Note: For EI queries, the only useful attribute to select is probably ICCMailboxID. You could join this table with the email class table for a finer grained selection, but be aware of the performance impact that this has.
For file system documents:
SELECT FI.* FROM <file instance class> FI WHERE <condition clause>
A query that selects email or file system documents based on a full-text search could be defined in the following way:
For email documents (single keyword):
SELECT DEI.this, DEI.* FROM <email instance class> DEI INNER JOIN ContentSearch cs ON DEI.this = cs.QueriedObject WHERE CONTAINS(*,'(@xmlxp:''//icc_content[.contains("""<keyword>""")]'')')
For files (that have been indexed):
SELECT FI.* FROM <file instance class> FI INNER JOIN ContentSearch cs
ON FI.This = cs.QueriedObject WHERE CONTAINS(*, '(("<keyword>"))')
OPTIONS (FULLTEXTROWLIMIT 750)
The following queries are examples for querying email documents in a FileNet P8 repository:
- To find email documents that were archived between the 8th of October 2013 and the 6th of November 2013 search against the property ICCMailDate:
SELECT DEI.* from ICCMail3 DEI WHERE DEI.ICCMailDate >= 2013-10-08T07:00:00.0000 AND DEI.ICCMailDate <= 2013-11-06T07:59:59.0000
- To find email documents that mention "test" in the content body or subject, define a full-text search query on the email content or subject with keywords:
SELECT DEI.* from ICCMail3 DEI INNER JOIN ContentSearch CS ON DEI.This = CS.QueriedObject WHERE CONTAINS(*, '(@xmlxp:''//icc_content[.contains("""test""")]'')')
SELECT DEI.* from ICCMail3 DEI INNER JOIN ContentSearch CS ON DEI.This = CS.QueriedObject WHERE CONTAINS(*, '(@xmlxp:''//icc_subject[.contains("""test""")]'')')
- To select email documents based on the sender or recipients of the documents, define a full-text search query against the sender or recipients:
SELECT DEI.this, DEI.* FROM ICCMail3 DEI INNER JOIN ContentSearch cs ON DEI.This =
cs.QueriedObject WHERE CONTAINS(*, '(@xmlxp:''//icc_recipients[.contains("""test1\/ibm""")]'')')
- To select email documents based on the ICCMailboxID value of the documents, search against the property ICCMailboxID:
- For email from a Lotus Domino server:
SELECT EI.* FROM ICCMailInstance3 EI where EI.ICCMailboxID='88257C0C:004A8EA6' order by EI.ICCMailReference
For email from a Microsoft Exchange server:
SELECT EI.* FROM ICCMailInstance3 EI where EI.ICCMailboxID='ebe6addbdf9bb2d6bf8030b2af9f072d' order by EI.ICCMailReference
- If the search query runs against an email instance (EI) class, for example, ICCMailInstance3, you must include the sub clause "order by ICCMailReference" in the query string, The reason is that all the ICCMailInstanceXX (EI) instances for the same email must be grouped together in the query result. If the EIs of duplicated email documents are not grouped together and the update of the same ICCMailXX (DEI) instance is submitted more than once after it is retrieved on the client side, an error will occur while updating the same DEI.
- When you specify the QueryString option, you must make sure that the document class that the query runs against is consistent with the data model component argument that you specified in the Set Retention tool command line. For example, if you specified -dei in the command line as an argument, the query string that you specify must run against the DEI class that you specified in the configuration file.
The query strings in the following three examples can be used to find files in which given attributes match a specific condition (the query is based on attribute values). Example 4 leverages full text search on the plain text from files.
1. Find file system documents which have been created before Nov 6th, 2013:
- Select FI.* from ICCFileInstance2 FI where FI.ICCCreatedDate < 20131106T075959Z
2. Find file system documents for which their IDs match the values in the list
- Select FI.* from ICCFileInstance2 FI where FI.id IN ({48CD1100-55F9-43D3-B8B2-C39E8727F44F}, {F24A1300-EF1E-4AD6-B4CB-1B9CE9603ED3})
3. Find file system documents for which the file path starts with "\\myhost\dir1\subdir1"
- Select FI.* from ICCFileInstance2 FI where FI.ICCFilePath like
'\\\\myhost\\dir1\\subdir1\\%'
Note that the backslash character is used as an escape character which means that you have to use "\\" for a single backslash.
4. To find indexed files that mention the keyword "confidential" in the file content, define a full-text search query:
- SELECT FI.* FROM ICCFileInstance2 FI INNER JOIN ContentSearch cs ON FI.This = cs.QueriedObject WHERE CONTAINS(*, '(("confidential"))') OPTIONS (FULLTEXTROWLIMIT 750)
Was this topic helpful?
Document Information
Modified date:
17 June 2018
UID
swg21655895