Testing the Crawl
About this task
Once the IBM WCM connector has been installed into your Watson™ Explorer Engine installation, you will likely want to run several tests to determine that it is functioning correctly. What follows are recommended steps to test the IBM WCM connector to ensure that it is working properly:
- In the Watson Explorer Engine administration tool's Configuration tab, navigate to your site collection seed configuration display.
- Test It - Click Test It to verify initial connectivity and that the IBM WCM connector is successfully communicating with IBM WCM.
- Confirm Authentication - Once connectivity has been successfully tested, confirm that you can authenticate with a user account, user name, and user password.
- Null Search - Perform a null search by simply clicking the search button in your Watson Explorer Engine or Application Builder search display. If results are not displayed, something is likely not configured properly. A null search should return all results.
- Document Search - If results are successfully displayed, then test the search bar by searching for a document that is already known to be part of the index. For instance, an administrator may search for a document that they added to IBM WCM previously and therefore know that this document should be returned as a search result by the IBM WCM connector.
Security/ACLs - If a known document is successfully returned in the search
results, then test security and ACLs (access control lists) by searching first for a
document to which access should be granted based on the current account permissions, and
then for a document that should not be accessible based on the current account
During testing, you must confirm that users can only access documents whose security requirements are satisfied by a user's current security and access permissions. You should test security and ACL support by using several accounts with different IBM WCM permissions.
Refresh Capability - Once that security authentication is properly confirmed,
test the refresh capability of the IBM WCM connector. Search for a document and note
exactly what is returned for that search. Locate the source document and modify it. Click
refresh. Confirm that the document has indeed been modified and returned successfully in a
search. Make sure that you pay attention to any time interval settings configured in the
site collection seed configuration.
Note: A full refresh will re-crawl the metadata of the documents. Due to the seedlist nature of the WCM API, that may also mean that most (if not all) of the content will be re-crawled as well. Therefore, a refresh may not be significantly faster than a full crawl. However, binary documents that are attached as links to the seedlist XML, will not be crawled during a refresh.
Document Count - At this point, it is suggested to test the overall document
count in the search results. Perform a search query. Note the number of results. Add a
document to IBM WCM and perform the same search. The total results should now be
Many of the tests here can safely be performed in a production environment. However, it may be harder to determine their effect. For example, if you are adding documents while other users are adding documents, it may be difficult to confirm that your document is the one that directly impacted the search results.
The IBM WCM connector should successfully perform each of the tests in the previous list. If not, you may not have configured the IBM WCM connector correctly, or an issue may exist in your computing environment that is preventing successful connectivity and document retrieval. The next section lists common steps to identify and debug such problems.