JCR Text-Search component is used to search for content/artifacts from authoring portlet in IBM Lotus Web Content Manager. It is also used when search is performed on pages using predefined content templates from Content Template Catalog. JCR search internally uses WebSphere Portal Search Engine (PSE) for text-search functions which has adopted Haifa Research Lab's PSE based on Apache's Lucene search engine.
A change in WebSphere Portal 8.5 as compared to previous portal versions is that all jcr properties specified in icm.properties file have been moved to Resource Environment Providers in admin console under custom properties of JCR ConfigService PortalContent provider shown in Fig 1.
Figure 1 – JCR Properties
Configure JCR Content Model Search
In order to configure text-search for Base/Virtual portal, we need to ensure that following properties are set in JCR ConfigService provider.
1. JCR text search property should set to true.
jcr.textsearch.enabled = true
Set this property to false if you want to disable text-search. With this disabled, authoring portlet search will not work. No new documents will be collected by crawler for indexing.
2. Enable document conversion service by setting the property as below
jcr.textsearch.convertor = com.ibm.icm.ts.convertor.WpsConvertor
There are other options available for this property, recommended is the above value. If this is not set correctly, rich text in web content and text in attachments will not be searchable. To know more about convertor options, please refer to the article.
3. Proper absolute path should be specified for the property as below
jcr.textsearch.indexdirectory = /opt/IBM/WebSphere/wp_profile/PortalServer/jcr/searchIndexes
JCR collections are created at the path specified in this property
4. PSE Type property should be set as, jcr.textsearch.PSE.type = localhost, for standalone environment. The other options are Simple Object Access Protocol (SOAP) and Enterprise
Java Beans (EJBs), which are used to configure remote search service for a clustered
Manage Search Portlet is used to administer search services, collections, and scopes.
JCR Collection out-of-the-box creation
In standalone portal, when we navigate to search collections in Manage Search portlet, it shows Default collection only (shown in Figure 2). JCR collections are not shown in fresh installation until search indexing is triggered which when triggered creates JCR collection out-of-the-box.
Figure 2 – Manage Search Portlet showing available collections in new portal
In order to trigger creation of JCR collection out-of-the-box, we need to ensure that properties mentioned above are set and perform below action.
Navigate to WCM Authoring portlet via path
Applications > Content > Web Content Authoring > Web Content Library and Modify/Create/Delete a content. Once the content is updated, navigate to Manage Search portlet and click on refresh button to see JCR Collection (shown in Figure 3).
Figure 3 – Manage Search Portlet showing JCR Collection.
JCRCollection<workspace_id>.properties file gets created for each JCR collection in the directory specified as the value for the jcr.textsearch.indexdirectory property. This properties file contains the configuration parameters required for creating a collection manually. You might need to create a collection manually if you have deleted the JCR collection. You can recreate a collection manually by referring the properties from the JCRCollection<workspace_id>.properties file. A snapshot of the index directory location displaying the properties file is shown in Figure 4. Contents of properties file is shown in Figure 5.
Figure 4 – TextSearch index directory listing the properties file for JCRCollection1
Figure 5 – Contents of JCRCollection1.properties file.
Creating JCR Collection manually
If the out-of-the-box collection fails to get created automatically, then you would need to create the JCRCollection manually. Before proceeding for manual creation, you must refer the troubleshooting section for failed creation of collection. It is not recommended to create the collection manually until advised by the L2/L3 team.
JCR collection is named using the format JCRCollection<wsid> where wsid is the workspace id. The Base Portal content is stored in the ROOTWORKSPACE of JCR which has id as 1, so the collection should be named as JCRCollection1.
Steps to create the JCR Collection manually in a stand-alone environment.
1. From Administration menu of portal goto Search Administration – Manage
Search – Search collections. Click the New Collection button (shown in Figure 6)
Figure 6 – Manage Search Portlet showing button to be clicked for collection creation.
2. In Create Search Collection form, choose the search service as Default Search Service, the name of collection as JCRCollection1 ( based on naming convention JCRCollection<wsid> ) and collection language as English.Click OK. ( shown in Figure 7)
Figure 7 – Create new collection form
A message is displayed for successful creation of collection and JCRCollection1 is visible under collections in the portlet. (shown in Figure 8)
Figure 8 – Manage Search Portlet with successful creation message and newly created collection JCRCollection1
3. Click on newly created JCRCollection1 and then click on New Content Source to create a new content source for JCRCollection1 (shown in Figure 9). The content source manages indexing of the documents. We specify the crawler details under this.
Figure 9 – Manage Search Portlet showing New Content Source button.
In New Content Source input form, choose the content source type as Seedlist Provider, the name of the new content source, in this case, we named it as JCRContentSource and the value of the URL as http://server name:portnumber//wps/seedlist/server?Action=GetDocuments&Format=ATOM&Locale=en_US&Range=5&Source=com.ibm.lotus.search.plugins.seedlist.retriever.jcr.JCRRetrieverFactory&Start=0&SeedlistId=1@OOTB_CRAWLER1 (shown in Figure 10).
Figure 10 – New Content Source form
We gave the value for the URL in New Content Source form as below:
The URL above has parameters Action, Format, Locale, Range, Source, Start and SeedlistId.
SeedlistId parameter is a combination of workspace id and the unique identifier for WORKSPACE. Format is <workspaceID>@OOTB_CRAWLER<workspaceID>.
The workspace id is 1 for Base Portal and the unique identifier can be anything. We have chosen OOTB_CRAWLER1 as unique identifier for simplicity. For a virtual portal, the workspace id is different. Hence the SeedlistId parameter for a virtual portal would be different from that of base portal.
Range parameter specifies the number of documents in a single page of a crawler session.
The Retriever sends the response to the crawler in XML ATOM format, and the response is sent as pages, each of the pages contains a range number of documents.
For example, if there is a list of 100 updates to be indexed, and the range is set as 10, then the retriever sends 10 pages to the crawler, each of which contains 10 documents. By default, the range value is set as 100.
The administrator can change the range parameter in the URL based on the portal requirement. The crawler has a timeout of 10 minutes set internally; however, if the portal is too slow and the Retriever is unable to retrieve 100 documents in 10 minutes, then the administrator can reduce the range value.
When the Content Source is created successfully, a message will be displayed at the top of the page.
Note: These instructions are in the WebSphere Product Documentation topic “Setting up JCR search collections” for creating the JCRCollection1 collection. Using this collection and content source, you are able to search for items within the WCM authoring portlet.
If the JCRCollection1 is created manually, then the scheduling interval must be configured for the Content Source such that the crawler runs automatically in the configured interval.
To do this:
1. Use the Schedulers tab to set the frequency with which the crawler should run to update the search content (shown in Figure 11).
Figure 11 – Content Source scheduler tab view
2. Choose the date, time, and update interval when the crawler should start running. Click Create.
For JCRCollection1, which is created automatically by the application, the index maintenance is scheduled to run every 60 minutes. If you want to change this frequency, you can configure it in the scheduler. Delete the existing scheduled Updates, and choose the day, time, and interval, click Create.
Creating JCR Collection manually for Virtual Portal
The procedure is same as specified above for Base Portal. The Crawler URL of Virtual Portal differs that of the Base Portal.
Note: All the stand-alone environments must use the Search service as Default Search Service.
Base Portal Crawler URL:
Since the virtual portal uses a different workspace, the workspace ID of Virtual Portal will be different than that of a Base Portal. Hence the SeedlistId parameter will be different in the case of a Virtual Portal.
For example if the workspace id of a Virtual Portal is 3, then the crawler url of Virtual Portal would be
Steps to determine the workspace ID of the Virtual Portal
1. Enable the JCR TextSearch trace com.ibm.icm.ts.*=finest in Portal - Administration – Portal Analysis - Enable Tracing.
2. Add or modify any WCM document and save it in Virtual Portal. This gives the workspace id information in trace logs.
In trace.log, you will find the trace information similar to this:
[6/6/14 18:51:04:337 IDT] 000001c3 BaseDBImpl 3 insertSeedlistEvents:Inserted event:Event:action='Update_Node(3)', timestamp='2014-06-06 18:51:04.337', document
id=<workspace: 3, itemid: AB001001N13F05B8320005B295>', parentID:<workspace: 3, itemid: >', wsid: 3