IO SharePoint Connector
This guide describes how to install and use the IBM® Watson™ Explorer Engine IO SharePoint connector, which is available for Watson Explorer Engine. The IO SharePoint connector enables Watson Explorer Engine applications to crawl SharePoint repositories and index the information that they contain.
The IO SharePoint connector allows you to crawl SharePoint objects, such as documents, user profiles, site collections, blogs, list items, membership lists, directory pages, and more. In each case, the object is indexed with its associated metadata and its attachments (for list items and documents).
What follows are descriptions of the more significant features and benefits of the IO SharePoint connector:
- Continuous Update - Continuous Update mode is the most significant enhancement compared to previous SharePoint connectors for Watson Explorer Engine. When the IO SharePoint connector is configured to operate in Continuous Update mode, new, updated, and deleted SharePoint data is continuously indexed. As a result, SharePoint repository updates, document modifications, and other changes can be searched with Watson Explorer Engine applications as quickly as possible. The Watson Explorer Engine crawler should never need to be manually stopped and restarted as long as Watson Explorer Engine is running.
- Improved Handling of User Profiles - The IO SharePoint connector enables Watson Explorer Engine to index both public and private data while maintaining proper SharePoint ACLs (Access Control Lists).
- Decreased Memory Footprint - The IO SharePoint connector is less resource intensive than the legacy Watson Explorer Engine SharePoint connector.
- Configurable Resource Levels - Watson Explorer Engine administrators can now scale system resources up or down depending on how aggressively they wish to crawl a SharePoint site or SharePoint site collection. Cache size is also configurable.
- No Custom Watson Explorer Engine Web Services - The IO SharePoint connector
eliminates the need for installing a custom Watson Explorer Engine web service on the
SharePoint server(s). Note: The custom web services were required by the legacy Watson Explorer Engine SharePoint connector to enable certain features. The IO SharePoint connector provides all of the same features without the need for custom web services.
- ASPX Page Handling - Allows the crawling and indexing of SharePoint site folders. This allows for folder directory pages to be returned in search results.
- Improved Blog Crawling - Due to improved URL handling, the IO SharePoint connector generates better search results when it crawls SharePoint sites and blogs.
- Improved Versioned Document Handling - Due to improved URL handling, the IO SharePoint connector provides for better crawling and indexing of current and previous versions of SharePoint site documents.
- Profiles Crawled Without MySites - User profiles can be crawled without the need
for having a SharePoint MySite. Note: In previous SharePoint connectors each user profile had to have a "MySite" associated with it in order to crawl that user profile. This is no longer the case and is covered in greater detail in the Configuring User Profiles section of this guide.
- More Control Over SharePoint Configurations - You can now control whether or not the connector honors the SharePoint configuration for site collections, sites, or lists which should not be indexed.
This document is intended for use by systems administrators tasked with installing and configuring the IO SharePoint connector and crawling SharePoint sites. Other key audience members include IT management and personnel generally responsible for the maintenance of SharePoint, Watson Explorer Engine, and Watson Explorer Application Builder applications.
A working knowledge of Watson Explorer Engine and a basic understanding of Watson Explorer Engine administration and configuration is required to follow along in this guide. A similar background in SharePoint administration is assumed.
The last few sections of this guide contain release notes, which is comprised of a version specific change log followed by a known issues section.