Queries

Draft comment:
This topic is shared by ICS, Filenet 5.5.10. As of: 2023-05-10

The essential parts of a Content Engine search are a SQL statement, contained in a SearchSQL instance, and the object store or object stores searched, contained in a SearchScope object. Content searches are specified through the CONTAINS operator in the SQL statement.

The SQL Statement

There are helper methods on the SearchSQL class to assist you in constructing a SQL statement. Alternatively, you can construct a SQL statement independently and pass it to a SearchSQL instance as a string. SQL statements must follow the IBM® FileNet standard, which generally conforms to SQL-92, with extensions for IBM FileNet specific constructs. For a complete description, see SQL Syntax Reference.

The SearchSQL helper methods are supplied for assistance in building SQL statements and cannot provide the level of specification you can achieve with an independently constructed statement. However, in a development environment, you can use the helper methods for initial construction of the SQL statement, then use the SearchSQL.toString method to get the SQL statement string and manually refine the SQL statement.

The Search Scope

The SearchScope methods run the SQL statement on one or more object stores to find objects (IndependentObject instances), database rows (RepositoryRow instances), or metadata (ClassDescription instances).

You can use the SearchScope class to search one or more object stores by using a single query. To create a query on multiple object stores, call the constructor for SearchScope with an array of object stores, similar to the following code snippet:


ObjectStore[] osArray = new ObjectStore[]{os1,os2}; 
SearchScope objStores = new SearchScope(osArray, MergeMode.INTERSECTION);

Then, use the SearchScope instance to run a query. The query merges the results from the object stores and returns them in a single, ordered list.

For example, if the SELECT DocumentTitle FROM Document WHERE DocumentTitle LIKE 'C%' ORDER BY DocumentTitle search statement is run on a list of two object stores, the results (in a single collection) might be:


Cars 
City 
Concrete 
Cows

Cars and Concrete might come from the first object store, and City and Cows might come from the second object store. Note that the results from the different object stores are intermingled in the list, ordered by the ORDER BY clause of the search statement.

Matching Classes and Properties

Classes and properties are defined in each object store. A class or property in one object store is considered to be the same class or property that exists in another object store only if the compared classes or properties have matching GUIDs. Having the same name does not indicate that the compared classes or properties are the same.

GUID values are stored in properties on both the ClassDefinition and PropertyDefinition classes.

A ClassDefinition object has both an Id property that is a GUID, and an AliasIds property that is a list of GUIDs. The Id property holds the GUID that is used to identify ClassDefinition objects. The AliasId properties can alternatively be used to identify these objects. Two ClassDefinition objects from two different object stores are considered to be the same if the value of either the Id property or AliasId property of one ClassDefinition object matches the value of the corresponding property on the other ClassDefinition object.

For example, the query SELECT * from DocSubClass run on a list of two object stores might return objects that are named DocSubClass from both object stores. If these objects do not have the same Id or AliasId property value, they are not recognized as the same object. If you attempt to query both object stores with the name DocSubClass does not return any rows from the second object store. However, the object that is named DocSubClass in the second object store can be referenced by using the string format of the ClassDefinition.Id property, rather than the name format.

PropertyDefinition objects have the Id, PrimaryId, and AliasId properties. For PropertyDefinition objects, the PrimaryId property is used to identify the object, rather than the Id property. (Note that the PrimaryId property is the same as the Id property of the PropertyTemplate object to which the property refers.) Two PropertyDefinition objects from two different object stores are considered to be the same object if either the PrimaryId or AliasId property value of one PropertyDefinition object matches the value of the corresponding property on the other PropertyDefinition object, and both PropertyDefinition objects are on matching ClassDefinition objects.

The AliasId properties for both ClassDefinition and PropertyDefinition objects are cumulative. For instance, suppose that four objects are to be merged from object stores A, B, C, D, with the class ID and alias ID values that are shown in the following table (single digit integers are used for brevity):

Object Store	Class ID	Alias ID	IDs of Class
OS-A	1	2	1, 2
OS-B	2	3	1, 2, 3
OS-C	3	4	1, 2, 3, 4
OS-D	4	(none)	1, 2, 3, 4

The values in the IDs of Class column indicate the cumulative object GUIDs, and if matched by any ID or Alias ID of another object, results in the merging of the two objects for the purposes of the query. Therefore, all of the objects in the table are aliased together as the same object. Note that this example illustrates how IDs are matched; a class alias scheme this complex in a real deployment is unlikely.

The typical aliasing scheme is as follows:

Object Store	Class ID	Alias ID
OS-A	1	(none)
OS-B	2	1
OS-C	3	1
OS-D	3	1

Duplicate matches are not allowed for alias IDs, which means that a single object cannot match more than one other object, and a single property cannot match more than one other property. If alias IDs are set up so that duplicate matches occur, an exception is thrown and the multiple object store query is not allowed for any objects across that combination of object stores (including the objects that contain duplicate alias IDs).

The system administrator normally creates the classes and properties on one object store, and then exports those definitions from that object store and imports them to any other object store that needs to support queries across object stores. This export/import operation ensures that the IDs of the classes and properties are the same in each object store. The imported names are also the same.

If the object stores that must support queries across object stores contain pre-existing objects with different IDs, then the alias IDs must be used as the alternative identifier. In this case, the system administrator must assign alias IDs to the intended matching objects and properties on each object store. When alias IDs are assigned, the ClassDefinition.Id property of an object in one object store is assigned to the AliasId list of that object in another object store. Additionally, the PropertyDefinition.PrimaryId property of a property in one object store is assigned to the AliasId list of the property in another object store.

Note: If two object stores need matching objects, the alias IDs for the corresponding objects must be assigned on only one of the two object stores.

Class and Property Names

When names of classes or properties are determined, it is the first object store in which the class is encountered that determines the name. For example, suppose that there is an object that is named "apple" in the first object store, and "orange" in a second object store, and that both objects have the same GUID value for their Id property. For an object store query that runs across both object stores, any reference to the object with the name "apple" would match both the apple and orange objects. Any name reference to the object with the name "orange" would throw an undefined class exception.

Because the search order of the object stores can affect name-based queries, use the same object store order whenever you perform queries across object stores because doing so is more efficient. Merging object stores A and B does not produce the same results as merging B and A. Therefore, the server must cache merged object store metadata that is order dependent (B & A and also A & B). Changing the order for one query versus the next can cause excessive amounts of metadata to be cached, resulting in either the caching of too much memory, or thrashing caused by metadata that is flushed from the cache (to restrict size) and then reconstituted later.

Merge Mode

The merge mode that is specified for a query across object stores affects how classes and properties are merged. There are two merge modes: intersection and union (MergeMode.INTERSECTION and MergeMode.UNION).

For an intersection merge, only objects and properties that are defined in all object stores are present in the merged metadata, and only these objects and properties can be referenced in a search. Any class or property that exists in one object store, but does not have a matching class or property in every object store, is excluded from the merged metadata, and cannot be used in a search.

For a union merge, all classes and properties from all object stores are present in the merged metadata, and all classes and properties can be returned.

As an example, assume the following:

There are three target object stores: OS1, OS2, and OS3.
The object "Alpha" exists in each of these object stores.
The IDs of "Alpha" match in each object store.
The IDs of the properties of "Alpha" match if the names match.

(Note that OS1 is the first object store in the collection.) The following custom properties then exist for "Alpha" in each object store:

OS1 - PropertyA, PropertyB, PropertyC
OS2 - PropertyB, PropertyC, PropertyD
OS3 - PropertyA, PropertyB, PropertyC, PropertyD

If you specify MergeMode.UNION, the properties that are returned are:

PropertyA, which represents OS1
PropertyB, which represents OS1, OS2, OS3
PropertyC, which represents OS1, OS2, OS3
PropertyD, which represents OS2, OS3

If you specify MergeMode.INTERSECTION, the properties that are returned are:

PropertyB
PropertyC

Attempts to select either PropertyA or PropertyD results in an undefined property exception.

If the classes had the same GUIDs for the same names, but the properties had different GUIDs and were not aliased, the MergeMode.UNION for the previous example would have the following properties:

PropertyA, which represents OS1
PropertyB, which represents OS1
PropertyB, which represents OS2
PropertyB, which represents OS3
PropertyC, which represents OS1
PropertyC, which represents OS2
PropertyC, which represents OS3
PropertyD, which represents OS2
PropertyD, which represents OS3

If you run the select statement "SELECT * FROM Alpha", the result is a row with 10 columns for each object store that contains a row. Each column in the rows that are returned are non-null only if the row is from the preceding object store in the list.

If the select statement was SELECT PropertyA, PropertyB, PropertyC, PropertyD FROM Alpha, PropertyA would come only from OS1 and would be null for rows from any other object store. Similarly, PropertyB would come only from OS1, PropertyC from OS1, and PropertyD from OS2. You cannot select only PropertyB from OS3 based on the property name, so this configuration is not useful, illustrating why you need to put alias IDs on properties (or export/import across object stores to make the IDs match); otherwise, the query results might not be meaningful.

Returned Objects

For queries across object stores, when a property that has the same GUID does not have the same name in each object store, the type of objects that are returned affect the property name: If RepositoryRow objects are returned, the property gets the name from the first object store in which it is defined, and the name is the same for rows from any subsequent object store in the list. If IndependentObject objects are returned, the property is named according to each object store in which it is defined.

RepositoryRow objects differ from IndependentObject objects in some notable ways:

A RepositoryRow object cannot be used for updates.
A RepositoryRow object can have data from multiple IndependentObject objects if joins are used in the query.
A RepositoryRow object can have duplicate properties.

As an example, suppose you run the statement SELECT apple FROM someclass against a list of two objects stores; where, in the first object store, the property "apple" matches (by a GUID) a property named "orange" in a second object store. A query that returns RepositoryRow objects will always return properties that are named "apple", regardless of which object store they came from, but a query that returns IndependentObject objects will return a property name of "apple" for data from the first object store and a property name of "orange" for data that is returned from the second object store. If this is not the case, attempts to do updates by using the IndependentObject objects that are returned from the second object store will generate the error "Property apple not defined."

When RepositoryRow objects are returned, the names of properties can be renamed. For instance, you might call SearchScope.fetchRows, then run SELECT Owner AS Bob FROM Document on the search results. In the results, each RepositoryRow object has a property that is named Bob. Although you cannot use the AS clause for returning IndependentObject objects, they can be used in a subsequent update.

Content Searches

Content (full-text) searches include in the query words or phrases that might be stored in the content of objects, or in the string-valued properties of these objects. For the content in an object or its string-valued properties to be searched, you must enable content-based retrieval (CBR) for the object and optionally any of its string-valued properties that you want to be included in a content search. CBR-enablement is controlled by the Boolean value of the IsCBREnabled property on the following objects:

ClassDefinition
The IsCBREnabled property enables full-text searches of content (if any exists) for the class, and allows string-valued properties to be enabled for full-text searches.
PropertyDefinitionString
The IsCBREnabled property enables the string-valued property to be included in content searches.

The IsCBREnabled property can be enabled only for Document, Annotation, CustomObject, and Folder objects.

A content search is initiated by a CONTAINS function in the SQL statement that is contained in SearchSQL. The CONTAINS function can search content in all properties, or in a single property.

For more information about the CONTAINS functions, see CBR Queries. For information about administrative interfaces for full-text information, see Content-Based Retrieval.

Note: Full-text queries can take a considerable amount of time to run. Some queries can finish in a few seconds, while others can potentially run for hours. Write your applications to allow the user to set a timeout; a single default value is probably not sufficient. The user settings ensure that either the query does not run longer than wanted, or that the timeout value is high enough to enable the query to finish execution. Note that the timeout value is the time that is required to fetch a page for a continuable query, not the time to fetch all pages for the query.

Stored Searches

A StoredSearch object can be one of two types: stored search or search template. Both types are persisted to an object store and are designed for performing searches multiple times.

Note: Stored searches are only available for use when the Stored Search Extensions add-on is installed.

The content of a StoredSearch object is the search criteria in the form of an XML string. It is subclassed from the Document object, so when you instantiate a StoredSearch object, you can work with it in the same ways as you work with a Document object (such as checking out the stored search, setting its content, checking it back in, filing it into a folder, and deleting it).

A StoredSearch object is identified as a stored search or a search template by the value of the searchtype element in the XML. The StoredSearch object can query for Document, Folder, or CustomObject objects. The XML objecttype attribute identifies the object type for the query.

Only one of the object types (Document, Folder, or Custom Object) can be specified per search clause in the XML. Each search clause must be handled as an individual query, requiring a separate SearchScope call to run each search clause.

You can create stored searches and search templates by using the search view in IBM Content Navigator and by saving the XML in a StoredSearch object in an object store. All stored searches must conform to the Stored Search schema. Use the SearchScope methods fetchObjects and fetchRows having StoredSearch in their signature to run a stored search.

By using the SearchTemplate* classes (those classes that have "SearchTemplate" as a prefix), you can make runtime modifications to the stored search or search template XML that is persisted in a StoredSearch object. The XML modifications are passed to a SearchScope call in a SearchTemplateParameters instance.

For more information, see Searching for Objects Using a Stored Search for more information.

Stored Search Type

A stored search predefines a query to retrieve Document, Folder or Custom Object objects (or subclasses of those classes) from one or more object stores. Only one object type can be specified per search clause.

Search Template Type

A search template can provide some or all of the search criteria and values for the query define. The template design gives the user the opportunity to modify the values of writable properties before the search is run. The search template identifies how the fields are to be processed (which ones require the user to assign a value, which fields are automatically pre-assigned, which fields can be modified or are read-only, and so on).

Search templates support Document, Folder, or Custom Object substitution at run time, enabling users to select documents, folders or custom objects, different from those specified in the search template XML. The specified objects are modified or replaced individually based on the itemid attribute of the relevant XML element.

Background Searches

Background search is a feature that enables you to run a search as a background process, which is similar to a sweep process. The results of a background search are stored as a set of independently persistable and queryableContent Engine objects, which can be examined when necessary. Running a query as part of a background search has the following advantages over a conventional search:

You can start a background search and proceed with other activities while the search is running.
The background search feature provides a reporting framework, which allows you to process the search results.

A background search uses classes that are based on two interfaces: CmBackgroundSearch and CmAbstractSearchResult. The CmBackgroundSearch interface is the base interface from which you define a subclass that defines the background search. The CmAbstractSearchResult interface is the base interface from which you define a subclass that defines the result objects that are returned as a result of a background search.

To set up a background search, an administrator first defines an SQL query search expression to use for the background search and defines the optional parameters that can be entered in the query by a user. In addition, the class of the objects and the property values to be saved in the background result set are defined. The following two procedures are described as if performed by using the API. Typically, however, a background search is set up and run by using the Content Platform Engine administration console. To set up a new background search, follow this procedure:

Identify what objects are to be searched and determine what filtering to use to produce the result set objects. Use this information to create the FROM and WHERE clauses in the background search SQL expression.
Decide what background search expression parameters that you want to allow a user to include in the search expression and add those parameters to the search expression.
Create a CmAbstractSearchResult subclass definition.
Determine the property values that you want to examine from the background search results and define a property template for each property value that you want to capture.
Use the new property templates to add custom properties to the CmAbstractSearchResult subclass definition. These custom properties match the property values that are returned in the set of search result objects.
Create a CmBackgroundSearch subclass definition.
Set the default value of the SearchExpression property definition of the CmBackgroundSearch subclass definition to the completed SQL search expression.
Set the required class of the SearchResults property definition to the CmAbstractSearchResult subclass that was previously defined.
Define a property template for each parameter that is included in the search expression.
Use the new property templates to add custom property definitions to the CmBackgroundSearch subclass definition.

To start a new background search, follow this procedure:

Create an instance of the CmBackgroundSearch subclass that represents the search.
Supply a value to each parameter-defining custom property of the CmBackgroundSearch object that you created.
Save the CmBackgroundSearch object. The server then starts the background search automatically.

After a background search is started, the Content Platform Engine server performs the following steps:

The server instantiates the CmAbstractSearchResult subclass that you defined for each object that is returned in the background search and stores the results in a CmAbstractSearchResultSet object collection. This collection can be retrieved by reading the SearchResults property of the CmBackgroundSearch object. The custom properties that were defined in the CmAbstractSearchResult subclass are included in the Properties collection of each CmAbstractSearchResult object.
The server populates the custom properties in the Properties collection of each CmAbstractSearchResult object with the values of the custom properties (either matched by symbolic name or mapped by an AS clause) that were selected in the background search query.

You can monitor the background search as you would monitor any sweep job by using the Content Platform Engine administration console. Because the results of the background search grow incrementally as the background process progresses, you can view the in-process results at any time by examining the SearchResults property enumeration of the CmBackgroundSearch object or by querying a CmAbstractSearchResult object. You can restrict the visibility of the search results by setting the ACL of the CmBackgroundSearch object.

Background Search Expression Parameters

Background search parameters are defined by custom properties that are added to a CmBackgroundSearch subclass that defines a particular background search. Note that not all the custom properties that are added to the subclass need to be used as parameters; they can also be defined for other purposes.

To define a parameter, follow these steps:

Create and save a custom property that defines the parameter. Specify the custom property as requiring a value and settable only on create. The custom property can be of any property type and cardinality except for binary and enumeration of object. The name of the parameter is defined as the symbolic name that is assigned to the custom property.
Although not required, it is recommended that you assign to the custom property a display name and descriptive text that indicates how the parameter is used.
Add the custom property to the CmBackgroundSearch subclass that defines the background search.

When a defined parameter is added to a SQL expression, the server substitutes the parameter with text to form the effective SQL according to the data type of its underlying custom property. The allowable custom property data types and their substitution SQL text are listed as follows:

List: a comma-separated list of the text form of the individual element values, which are surrounded by parentheses. For example: (1,2,3,4).
Singleton Boolean: True or False.
Integer: the natural toString() representation of the value.
Float: the natural toString() representation of the value.
Id: the natural toString() representation of the value.
DateTime: the W3C representation of the value, of the form yyyy-mm-ddThh:mm:ssZ, as is required by the SQL syntax.
String: a string value that is surrounded by single quotation marks.
Object: an object literal of the form OBJECT({id of referenced object}).

Custom Search Functions

Custom search functions are functions that you create in an object store that can be used in the SELECT list of a SQL statement for both ad hoc searches and background searches. Each custom search function receives one or more input parameters and outputs a return value. The data type of input parameters can be of any type; however, the return value cannot be a collection object type (cardinality of list or enumeration).

Restriction:

Custom search functions can only be used with the fetchRows method, not with the fetchObjects method.
Custom search functions can only be used in ad hoc or background searches in a selection list. They cannot be used in a stored search.
For searches that combine both a content-based retrieval (CBR) search and a relational search on a database, content search functions are not allowed in searches where the database is searched first. Such searches result in an "Invalid node type" error. For more information, see CBR Query Optimization.

To add a custom search function to an object store, create an instance of the CmSearchFunctionDefinition interface. The CmSearchFunctionDefinition interface is a subinterface of the Action interface and provides handler subinterfaces that you implement with the actions to be taken, coded as JavaScript or Java™ components. A CmSearchFunctionDefinition object identifies an implemented handler with the ProgId property. A handler that is implemented with JavaScript is set on the ScriptText property. A handler that is implemented for Java (JAR or class file) can be checked into a Content Engine object store as a CodeModule object, requiring that the CodeModule property be set. Alternatively, you can set the location of the Java component in the class path of the application server. In addition to the properties that are present in objects based on the Action class, a CmSearchFunctionDefinition object also includes the CmFunctionName property. This property is populated by the server and specifies the name of the custom search function as it appears in an SQL expression. The search function name must be of the form <namespace>::<name>, where both <namespace> and <name> adhere to the Content Engine symbolic name conventions, and be unique relative to other search function names. The code that is specified by a custom search function must implement the methods of the SearchFunctionHandler interface.

To use a custom search function in an SQL statement, see Custom Search Function query syntax.