Basic IO SharePoint Connector Configuration

About this task

The minimum steps required to use the new IO SharePoint connector are the following:

Procedure

  1. Create a new Watson™ Explorer Engine search collection.
  2. In the Configuration tab of your Watson Explorer Engine search collection, click Add a new seed.
  3. A pop-up window displays. Select IO SharePoint from the scrollable list of available connectors, and click add.
  4. In the Web Application or Site Collections URLs box, enter the URL(s) of your SharePoint site(s).
    Note: Use the same Web Application or Site Collection URL as it appears in Central Administration. If you use a non-root Site Collection URL, referred to as a Managed Path URL, see the section Filtering URLs At Crawl Time for more information.
  5. In the Username field, enter the SharePoint user name of the account to use for crawling.
    Important:

    The SharePoint account that you use to crawl your SharePoint sites must at least have full read access privileges as described in Creating A Crawling Account In SharePoint.

  6. In the Password field, enter the crawling account's SharePoint password. The actual password that you enter is not displayed. Enter that password again in the box below to confirm that you entered it correctly.
  7. In the Authentication Type drop-down menu, select the authentication method for your SharePoint deployment: BASIC, NTLM2, KERBEROS, or CLAIMS_BASED_AUTHENTICATION. The default authentication type is NTLM2. If the target is SharePoint Online, select CLAIMS_BASED_AUTHENTICATION.
    Note: If you are using Windows Claims Based Authentication, CBA, select the appropriate authentication type and set ACLs contain Claims option under Crawling and ACLs section to on.
  8. Check Use SharePoint Online Authentication if the target server is SharePoint Online.
  9. Set the Seed URL Type.

    This setting identifies what type of SharePoint object the provided seed URLs point to: site collections or web applications (also known as virtual servers). If the seed URL type is set to Site Collections, then only the children of the site collection referenced by the URL are crawled. If the Seed URL type is set to Web Applications, then all of the site collections (and their children) belonging to the web applications referenced by each URL are crawled.

    Note: SharePoint Online does not have Web Applications. Therefore, in order to crawl multiple site collections on SharePoint Online, you need to specify every URL of the site collections as the seed URL. Note that site collections do not have parent-child relationship; so that even if you specify https://<server1>.sharepoint.com as the seed URL, the connector will not crawl another site collection such as https://<server1>.sharepoint.com/sites/<anothersitecollection>, although there might seem to be a relationship between them.
  10. Click OK/Apply to save the updated configuration information for the IO SharePoint connector.

    After saving your changes, you will be able to use the commands on the Overview tab of the Watson Explorer Engine Administration tool. At this point, you can initiate a crawl of SharePoint sites on the specified server(s) using the IO SharePoint connector. See IO SharePoint Advanced Configuration and Performance Tuning for more information.

    Warning: Be sure you have selected the appropriate authentication method. The IO SharePoint connector will not function correctly if the wrong Authentication Type is selected. NTLM2 is selected by default.

Example

Crawling SharePoint Server - NTLM Authentication
Table 1. Seed Component Section
Field Value
Web Application or Site Collection URLs https://localsharepoint.example.com
Username Administrator
Password <password>
Seed URL Type default (Site Collections)
Authentication Type default (NTLM2)

Crawling SharePoint Server - ADFS Authentication
Table 2. Seed Component Section
Field Value
Web Application or Site Collection URLs https://yoursharepoint.example.com
Username Administtrtator@example.com
Password <password>
Seed URL Type default (Site Collections)
Authentication Type CLAIMS_BASED_AUTHENTICATION
Table 3. Claims Based Authentication Section
Field Value
Security Token Service Endpoint https://youradfs.example.com/adfs/services/trust/2005/UsernameMixed
Relaying Party Trust Identifier (AppliesTo, Realm) urn:sharepoint:yourrelayingpartytrustid

Crawling SharePoint Online - Native Authentication

If your crawler username contains a domain name ending with "onmicrosoft.com", native authentication is likely used. Please check with your SharePoint administrator for details on your SharePoint configuration.

Table 4. Seed Component Section
Field Value
Web Application or Site Collection URLs https://AAAA.sharepoint.com
Username azureuser@AAAA.onmicrosoft.com
Password <password>
Seed URL Type default (Site Collections)
Authentication Type CLAIMS_BASED_AUTHENTICATION
Use SharePoint Online Authentication Checked
Table 5. Claims Based Authentication Section
Field Value
Security Token Service Endpoint https://login.microsoftonline.com/extSTS.srf

Crawling SharePoint Online - ADFS Authentication (v12.0.2.2 or later)

If your crawler username contains your own domain name such as "example.com", ADFS authentication will be used. Please check with your SharePoint administrator for details on your SharePoint configuration.

Table 6. Seed Component Section
Field Value
Web Application or Site Collection URLs https://BBBB.sharepoint.com
Username username@example.com
Password <password>
Seed URL Type default (Site Collections)
Authentication Type CLAIMS_BASED_AUTHENTICATION
Use SharePoint Online Authentication Checked
Table 7. Claims Based Authentication Section
Field Value
Security Token Service Endpoint https://adfs.example.com/adfs/services/trust/2005/usernamemixed
Relaying Party Trust Identifier (AppliesTo, Realm) urn:federation:MicrosoftOnline