Exchange Server crawlers

To collect content from public folders and user mailboxes that are managed by Microsoft Exchange Server, configure an Exchange Server crawler.

Crawler connection credentials

When you create the crawler, you can specify credentials that allow the crawler to connect to the sources to be crawled. You can also configure connection credentials when you specify general security settings for the system. If you use the latter approach, multiple crawlers and other system components can use the same credentials. For example, the search servers can use the credentials when determining whether a user is authorized to access content.

Crawler configuration

Before you configure an Exchange Server crawler, you must configure the Exchange Web Service (EWS) on the Exchange Server server to allow the crawler to access content.

When you configure an Exchange Server crawler in the administration console, you:
  • Specify properties that control how the crawler operates and uses system resources. The crawler properties control how the crawler collects content from all servers in the crawl space.
  • Specify information about the Exchange Server server that you want to crawl.

    You must specify a user ID and password so that the crawler can access content on the server. The user ID can be in user principal name (UPN) format or domain format, such as Domain\ExampleAccountName.

  • Select the public folders or personal folders to crawl. The crawler cannot crawl both types of folders in the same crawler session. To include public folders and personal folders in a collection, create separate crawlers.
  • Specify options for making documents searchable. For example, you can exclude certain types of documents from the crawl space.
  • Set up a schedule for crawling the Exchange Server server.

Compound documents

If a document contains multiple parts, and you want all parts of the document to be treated as a single document in the search results, you can configure the crawler to support compound documents. In this case, a parent document that contains child documents can be searched as a single document. If the search terms are found, all of the child documents are listed with the parent document in the search results. If support for compound documents is not enabled in the crawler configuration, the parent and child documents are searched separately and returned as separate documents in the search results. For more information, read about support for crawling compound documents.

Public folders

The Exchange Server crawler can crawl any number of folders and subfolders on Exchange Server public folder servers. When you create a crawler, you select the content that you want to collect from one public folder server. Later you can edit the crawl space to add content from another server.

User mailboxes

The Exchange Server crawler can crawl any number of personal folders and items in Exchange Server user mailboxes. When you create a crawler, you select the content that you want to collect from one Mailbox server. Later you can edit the crawl space to add content from another server. The crawler can collect content only from user mailboxes, not other types of mailboxes on the Mailbox server.

If you plan to collect content from a Mailbox server, you must deploy a provided web service, ESExchangeServices, on the Mailbox server so that the crawler can access folders, items, and user permissions.

The crawler supports the following types of user mailbox folders and item types. Other types of folders, such as news feeds and RSS feeds, are not supported. User-created folders include all top-level folders that a user creates.
Table 1. Supported user mailbox folders and item types
Personal folders Item types
Inbox
Drafts
Sent items
Outbox
Calendar
Contacts
Tasks
Notes
User-created folders
Message(Mail)
Calendar
Task
Contact
Notes
PostItem
MeetingMessage
MeetingRequest
MeetingResponse
MeetingCancellation

Mailbox filters

When you select the mailbox content to crawl, you can specify filters to include and exclude documents. For example, you can select specific types of personal folders that you want to include in the collection. You can also filter content according to:
  • Specific domain controllers that organize the content on the server
  • Specific mailbox servers
  • Specific database servers
  • Specific organization units (OU), which includes all mailboxes of users who belong to the OU
  • Specific users
If you specify more than one condition for a specific filter type, such as two mailbox servers, the list of mailboxes available to be crawled includes mailboxes that match either condition. If you specify more than one filter type, such as a mailbox server and a database server, the list of mailboxes available to be crawled includes mailboxes that match both conditions.

After you create the crawler, you cannot change the filters. To crawl different personal folders or a different mailbox server, for example, you must create a separate crawler.

Mailbox security

To obtain security data for searching user mailboxes, you must deploy a provided web service, ESCommonServices, on the Exchange Server Mailbox server. This service enables Watson Explorer Content Analytics to obtain group lists and permissions necessary for pre-filtering access controls