Queries
The essential parts of a Content Engine search
are a SQL statement, contained in a SearchSQL
instance,
and the object store or object stores searched, contained in a SearchScope
object.
Content searches are specified through the CONTAINS operator in the SQL
statement.
The SQL Statement
There are helper methods on the SearchSQL
class
to assist you in constructing a SQL statement. Alternatively, you
can construct a SQL statement independently and pass it to a SearchSQL
instance
as a string. SQL statements must follow the IBM® FileNet standard,
which generally conforms to SQL-92, with extensions for IBM FileNet specific
constructs. For a complete description, see SQL Syntax Reference.
The SearchSQL helper methods are supplied for assistance
in building SQL statements and cannot provide the level of specification
you can achieve with an independently constructed statement. However,
in a development environment, you can use the helper methods for initial
construction of the SQL statement, then use the SearchSQL.toString
method
to get the SQL statement string and manually refine the SQL statement.
The Search Scope
The SearchScope
methods run the SQL statement
on one or more object stores to find objects (IndependentObject
instances),
database rows (RepositoryRow
instances), or metadata
(ClassDescription
instances).
You
can use the SearchScope
class to search one or more
object stores by using a single query. To create a query on multiple
object stores, call the constructor for SearchScope
with
an array of object stores, similar to the following code snippet:
ObjectStore[] osArray = new ObjectStore[]{os1,os2};
SearchScope objStores = new SearchScope(osArray, MergeMode.INTERSECTION);
Then, use the SearchScope
instance
to run a query. The query merges the results from the object stores
and returns them in a single, ordered list.
For example, if the SELECT DocumentTitle FROM Document
WHERE DocumentTitle LIKE 'C%' ORDER BY DocumentTitle
search
statement is run on a list of two object stores, the results (in a
single collection) might be:
Cars
City
Concrete
Cows
Cars and Concrete might come from the first object store, and City and Cows might come from the second object store. Note that the results from the different object stores are intermingled in the list, ordered by the ORDER BY clause of the search statement.
Matching Classes and Properties
Classes and properties are defined in each object store. A class or property in one object store is considered to be the same class or property that exists in another object store only if the compared classes or properties have matching GUIDs. Having the same name does not indicate that the compared classes or properties are the same.
GUID values are stored in properties on both
the ClassDefinition
and PropertyDefinition
classes.
A ClassDefinition
object has both an Id
property that is a GUID, and an AliasIds property that is a list of
GUIDs. The Id property holds the GUID that is used to identify ClassDefinition
objects.
The AliasId properties can alternatively be used to identify these
objects. Two ClassDefinition
objects from two different
object stores are considered to be the same if the value of either
the Id property or AliasId property of one ClassDefinition
object
matches the value of the corresponding property on the other ClassDefinition
object.
For example, the query SELECT * from DocSubClass
run
on a list of two object stores might return objects that are named
DocSubClass from both object stores. If these objects do not have
the same Id or AliasId property value, they are not recognized as
the same object. If you attempt to query both object stores with the
name DocSubClass does not return any rows from the second object store.
However, the object that is named DocSubClass in the second object
store can be referenced by using the string format of the ClassDefinition.Id
property,
rather than the name format.
PropertyDefinition
objects
have the Id, PrimaryId, and AliasId properties. For PropertyDefinition
objects,
the PrimaryId property is used to identify the object, rather than
the Id property. (Note that the PrimaryId property is the same as
the Id property of the PropertyTemplate
object to
which the property refers.) Two PropertyDefinition
objects
from two different object stores are considered to be the same object
if either the PrimaryId or AliasId property value of one PropertyDefinition
object
matches the value of the corresponding property on the other PropertyDefinition
object,
and both PropertyDefinition
objects are on matching ClassDefinition
objects.
The AliasId properties for both ClassDefinition
and PropertyDefinition
objects
are cumulative. For instance, suppose that four objects are to be
merged from object stores A, B, C, D, with the class ID and alias
ID values that are shown in the following table (single digit integers
are used for brevity):
|
The values in the IDs of Class column indicate the cumulative object GUIDs, and if matched by any ID or Alias ID of another object, results in the merging of the two objects for the purposes of the query. Therefore, all of the objects in the table are aliased together as the same object. Note that this example illustrates how IDs are matched; a class alias scheme this complex in a real deployment is unlikely.
The typical aliasing scheme is as follows:
|
Duplicate matches are not allowed for alias IDs, which means that a single object cannot match more than one other object, and a single property cannot match more than one other property. If alias IDs are set up so that duplicate matches occur, an exception is thrown and the multiple object store query is not allowed for any objects across that combination of object stores (including the objects that contain duplicate alias IDs).
The system administrator normally creates the classes and properties on one object store, and then exports those definitions from that object store and imports them to any other object store that needs to support queries across object stores. This export/import operation ensures that the IDs of the classes and properties are the same in each object store. The imported names are also the same.
If the object stores
that must support queries across object stores contain pre-existing
objects with different IDs, then the alias IDs must be used as the
alternative identifier. In this case, the system administrator must
assign alias IDs to the intended matching objects and properties on
each object store. When alias IDs are assigned, the ClassDefinition.Id
property
of an object in one object store is assigned to the AliasId list of
that object in another object store. Additionally, the PropertyDefinition.PrimaryId
property
of a property in one object store is assigned to the AliasId list
of the property in another object store.
Class and Property Names
When names of classes or properties are determined, it is the first object store in which the class is encountered that determines the name. For example, suppose that there is an object that is named "apple" in the first object store, and "orange" in a second object store, and that both objects have the same GUID value for their Id property. For an object store query that runs across both object stores, any reference to the object with the name "apple" would match both the apple and orange objects. Any name reference to the object with the name "orange" would throw an undefined class exception.
Because the search order of the object stores can affect name-based queries, use the same object store order whenever you perform queries across object stores because doing so is more efficient. Merging object stores A and B does not produce the same results as merging B and A. Therefore, the server must cache merged object store metadata that is order dependent (B & A and also A & B). Changing the order for one query versus the next can cause excessive amounts of metadata to be cached, resulting in either the caching of too much memory, or thrashing caused by metadata that is flushed from the cache (to restrict size) and then reconstituted later.
Merge Mode
The merge mode that is specified for a query across object stores affects how classes and properties are merged. There are two merge modes: intersection and union (MergeMode.INTERSECTION and MergeMode.UNION).
For an intersection merge, only objects and properties that are defined in all object stores are present in the merged metadata, and only these objects and properties can be referenced in a search. Any class or property that exists in one object store, but does not have a matching class or property in every object store, is excluded from the merged metadata, and cannot be used in a search.
For a union merge, all classes and properties from all object stores are present in the merged metadata, and all classes and properties can be returned.
As an example, assume the following:
- There are three target object stores: OS1, OS2, and OS3.
- The object "Alpha" exists in each of these object stores.
- The IDs of "Alpha" match in each object store.
- The IDs of the properties of "Alpha" match if the names match.
(Note that OS1 is the first object store in the collection.) The following custom properties then exist for "Alpha" in each object store:
- OS1 - PropertyA, PropertyB, PropertyC
- OS2 - PropertyB, PropertyC, PropertyD
- OS3 - PropertyA, PropertyB, PropertyC, PropertyD
If you specify MergeMode.UNION, the properties that are returned are:
- PropertyA, which represents OS1
- PropertyB, which represents OS1, OS2, OS3
- PropertyC, which represents OS1, OS2, OS3
- PropertyD, which represents OS2, OS3
If you specify MergeMode.INTERSECTION, the properties that are returned are:
- PropertyB
- PropertyC
Attempts to select either PropertyA or PropertyD results in an undefined property exception.
If the classes had the same GUIDs for the same names, but the properties had different GUIDs and were not aliased, the MergeMode.UNION for the previous example would have the following properties:
- PropertyA, which represents OS1
- PropertyB, which represents OS1
- PropertyB, which represents OS2
- PropertyB, which represents OS3
- PropertyC, which represents OS1
- PropertyC, which represents OS2
- PropertyC, which represents OS3
- PropertyD, which represents OS2
- PropertyD, which represents OS3
If you run the select statement "SELECT * FROM Alpha", the result is a row with 10 columns for each object store that contains a row. Each column in the rows that are returned are non-null only if the row is from the preceding object store in the list.
If the select statement was SELECT PropertyA, PropertyB,
PropertyC, PropertyD FROM Alpha
, PropertyA would come only
from OS1 and would be null for rows from any other object store. Similarly,
PropertyB would come only from OS1, PropertyC from OS1, and PropertyD
from OS2. You cannot select only PropertyB from OS3 based on the property
name, so this configuration is not useful, illustrating why you need
to put alias IDs on properties (or export/import across object stores
to make the IDs match); otherwise, the query results might not be
meaningful.
Returned Objects
For queries across object stores, when a property that
has the same GUID does not have the same name in each object store,
the type of objects that are returned affect the property name: If RepositoryRow
objects
are returned, the property gets the name from the first object store
in which it is defined, and the name is the same for rows from any
subsequent object store in the list. If IndependentObject
objects
are returned, the property is named according to each object store
in which it is defined.
RepositoryRow
objects
differ from IndependentObject
objects in some notable
ways:
- A
RepositoryRow
object cannot be used for updates. - A
RepositoryRow
object can have data from multipleIndependentObject
objects if joins are used in the query. - A
RepositoryRow
object can have duplicate properties.
As an example, suppose you run the statement SELECT
apple FROM someclass
against a list of two objects stores;
where, in the first object store, the property "apple" matches (by
a GUID) a property named "orange" in a second object store. A query
that returns RepositoryRow
objects will always return
properties that are named "apple", regardless of which object store
they came from, but a query that returns IndependentObject
objects
will return a property name of "apple" for data from the first object
store and a property name of "orange" for data that is returned from
the second object store. If this is not the case, attempts to do updates
by using the IndependentObject
objects that are returned
from the second object store will generate the error "Property apple
not defined."
When RepositoryRow
objects
are returned, the names of properties can be renamed. For instance,
you might call SearchScope.fetchRows
, then run SELECT
Owner AS Bob FROM Document
on the search results. In the
results, each RepositoryRow
object has a property
that is named Bob. Although you cannot use the AS clause for returning IndependentObject
objects,
they can be used in a subsequent update.
Content Searches
Content (full-text) searches include in the query words or phrases that might be stored in the content of objects, or in the string-valued properties of these objects. For the content in an object or its string-valued properties to be searched, you must enable content-based retrieval (CBR) for the object and optionally any of its string-valued properties that you want to be included in a content search. CBR-enablement is controlled by the Boolean value of the IsCBREnabled property on the following objects:
ClassDefinition
The
IsCBREnabled
property enables full-text searches of content (if any exists) for the class, and allows string-valued properties to be enabled for full-text searches.-
PropertyDefinitionString
The
IsCBREnabled
property enables the string-valued property to be included in content searches.
The IsCBREnabled
property can be
enabled only for Document
, Annotation
, CustomObject
,
and Folder
objects.
A content search
is initiated by a CONTAINS function in the SQL statement that is contained
in SearchSQL
. The CONTAINS function can search content
in all properties, or in a single property.
For more information about the CONTAINS functions, see CBR Queries. For information about administrative interfaces for full-text information, see Content-Based Retrieval.
Stored Searches
A StoredSearch object can be one of two types: stored search or search template. Both types are persisted to an object store and are designed for performing searches multiple times.
The content of a StoredSearch
object is the search criteria in the form of an
XML string. It is subclassed from the Document
object, so when you instantiate a
StoredSearch
object, you can work with it in the same ways as you work with a
Document
object (such as checking out the stored search, setting its content,
checking it back in, filing it into a folder, and deleting it).
A StoredSearch
object is identified as a stored search or a search template by
the value of the searchtype
element in the XML. The StoredSearch
object can query for Document, Folder, or CustomObject objects. The XML objecttype
attribute identifies the object type for the query.
Only one of the object types (Document, Folder, or Custom Object) can be specified per search
clause in the XML. Each search clause must be handled as an individual query, requiring a separate
SearchScope
call to run each search clause.
You can create stored searches and search templates by using the search view in IBM Content
Navigator and by saving the XML in a StoredSearch
object in an object store. All
stored searches must conform to the Stored Search
schema. Use the SearchScope
methods fetchObjects
and
fetchRows
having StoredSearch
in their signature to run a stored
search.
By using the SearchTemplate* classes (those classes that have "SearchTemplate" as a prefix), you
can make runtime modifications to the stored search or search template XML that is persisted in a
StoredSearch
object. The XML modifications are passed to a
SearchScope
call in a SearchTemplateParameters
instance.
For more information, see Searching for Objects Using a Stored Search for more information.
Stored Search Type
A stored search predefines a query to retrieve Document, Folder or Custom Object objects (or subclasses of those classes) from one or more object stores. Only one object type can be specified per search clause.
Search Template Type
A search template can provide some or all of the search criteria and values for the query define. The template design gives the user the opportunity to modify the values of writable properties before the search is run. The search template identifies how the fields are to be processed (which ones require the user to assign a value, which fields are automatically pre-assigned, which fields can be modified or are read-only, and so on).
Search templates support Document, Folder, or Custom Object substitution at run time, enabling
users to select documents, folders or custom objects, different from those specified in the search
template XML. The specified objects are modified or replaced individually based on the
itemid
attribute of the relevant XML element.
Background Searches
- You can start a background search and proceed with other activities while the search is running.
- The background search feature provides a reporting framework, which allows you to process the search results.
A background search uses classes that are based on two interfaces: CmBackgroundSearch
and CmAbstractSearchResult
. The CmBackgroundSearch
interface is the base
interface from which you define a subclass that defines the background search. The
CmAbstractSearchResult
interface is the base interface from which you define a
subclass that defines the result objects that are returned as a result of a background search.
- Identify what objects are to be searched and determine what filtering to use to produce the result set objects. Use this information to create the FROM and WHERE clauses in the background search SQL expression.
- Decide what background search expression parameters that you want to allow a user to include in the search expression and add those parameters to the search expression.
- Create a
CmAbstractSearchResult
subclass definition. - Determine the property values that you want to examine from the background search results and define a property template for each property value that you want to capture.
- Use the new property templates to add custom properties to the
CmAbstractSearchResult
subclass definition. These custom properties match the property values that are returned in the set of search result objects. - Create a
CmBackgroundSearch
subclass definition. - Set the default value of the SearchExpression property definition
of the
CmBackgroundSearch
subclass definition to the completed SQL search expression. - Set the required class of the SearchResults property definition
to the
CmAbstractSearchResult
subclass that was previously defined. - Define a property template for each parameter that is included in the search expression.
- Use the new property templates to add custom property definitions
to the
CmBackgroundSearch
subclass definition.
- Create an instance of the
CmBackgroundSearch
subclass that represents the search. - Supply a value to each parameter-defining custom property of the
CmBackgroundSearch
object that you created. - Save the
CmBackgroundSearch
object. The server then starts the background search automatically.
- The server instantiates the
CmAbstractSearchResult
subclass that you defined for each object that is returned in the background search and stores the results in aCmAbstractSearchResultSet
object collection. This collection can be retrieved by reading the SearchResults property of theCmBackgroundSearch
object. The custom properties that were defined in theCmAbstractSearchResult
subclass are included in theProperties
collection of eachCmAbstractSearchResult
object. - The server populates the custom properties in the
Properties
collection of eachCmAbstractSearchResult
object with the values of the custom properties (either matched by symbolic name or mapped by an AS clause) that were selected in the background search query.
You can monitor the background search as you would monitor
any sweep job by using the Content Platform Engine administration
console. Because the results of the background search grow incrementally
as the background process progresses, you can view the in-process
results at any time by examining the SearchResults property enumeration
of the CmBackgroundSearch
object or by querying a CmAbstractSearchResult
object.
You can restrict the visibility of the search results by setting the
ACL of the CmBackgroundSearch
object.
Background Search Expression Parameters
Background search parameters are defined
by custom properties that are added to a CmBackgroundSearch
subclass
that defines a particular background search. Note that not all the
custom properties that are added to the subclass need to be used as
parameters; they can also be defined for other purposes.
- Create and save a custom property that defines the parameter. Specify the custom property as requiring a value and settable only on create. The custom property can be of any property type and cardinality except for binary and enumeration of object. The name of the parameter is defined as the symbolic name that is assigned to the custom property.
- Although not required, it is recommended that you assign to the custom property a display name and descriptive text that indicates how the parameter is used.
- Add the custom property to the
CmBackgroundSearch
subclass that defines the background search.
- List: a comma-separated list of the text form of the individual
element values, which are surrounded by parentheses. For example:
(1,2,3,4)
. - Singleton Boolean:
True
orFalse
. - Integer: the natural
toString()
representation of the value. - Float: the natural
toString()
representation of the value. - Id: the natural
toString()
representation of the value. - DateTime: the W3C representation of the value, of the form
yyyy-mm-ddThh:mm:ssZ
, as is required by the SQL syntax. - String: a string value that is surrounded by single quotation marks.
- Object: an object literal of the form
OBJECT({id of referenced object})
.
Custom Search Functions
Custom search functions are functions that you create in an object store that can be used in the SELECT list of a SQL statement for both ad hoc searches and background searches. Each custom search function receives one or more input parameters and outputs a return value. The data type of input parameters can be of any type; however, the return value cannot be a collection object type (cardinality of list or enumeration).
- Custom search functions can only be used with the
fetchRows
method, not with thefetchObjects
method. - Custom search functions can only be used in ad hoc or background searches in a selection list. They cannot be used in a stored search.
- For searches that combine both a content-based retrieval (CBR) search and a relational search on a database, content search functions are not allowed in searches where the database is searched first. Such searches result in an "Invalid node type" error. For more information, see CBR Query Optimization.
To add a custom search function to an object store, create an instance
of the CmSearchFunctionDefinition
interface. The CmSearchFunctionDefinition
interface
is a subinterface of the Action
interface and provides
handler subinterfaces that you implement with the actions to be taken,
coded as JavaScript or Java™ components. A CmSearchFunctionDefinition
object
identifies an implemented handler with the ProgId property. A handler
that is implemented with JavaScript is
set on the ScriptText property. A handler that is implemented for Java (JAR or class file) can be
checked into a Content Engine object
store as a CodeModule
object, requiring that the
CodeModule property be set. Alternatively, you can set the location
of the Java component in the class path of the application server.
In addition to the properties that are present in objects based on
the Action
class, a CmSearchFunctionDefinition
object
also includes the CmFunctionName property. This property is populated
by the server and specifies the name of the custom search function
as it appears in an SQL expression. The search function name must
be of the form <namespace>::<name>,
where both <namespace> and <name>
adhere to the Content Engine symbolic
name conventions, and be unique relative to other search function
names. The code that is specified by a custom search function must
implement the methods of the SearchFunctionHandler
interface.
To use a custom search function in an SQL statement, see Custom Search Function query syntax.
This topic is shared by ICS, Filenet 5.5.10. As of: 2023-05-10