Deployment footprint of Content Collector for Microsoft SharePoint Online

An overview of the Content Collector for Microsoft SharePoint components, the authorization protocol used, nuances of a hybrid environment, and the architectural and functional differences between Microsoft SharePoint and Microsoft SharePoint Online.

Content Collector for Microsoft SharePoint Online support enables user to archive, stub, view, or restore archived content for Microsoft SharePoint Online. It also allows to preview and restore the already archived documents for the migrated libraries that are migrated from Microsoft SharePoint On-premises to Microsoft SharePoint Online after running a specific Task Route.

IBM Content Collector for Microsoft SharePoint Online components

This section provides information on what components Content Collector for Microsoft SharePoint Online installs, which SharePoint integration options it uses and how Content Collector for Microsoft SharePoint processes the data. Understanding this helps you assess your installation and implementation and ensure that it is complete and correct.

Content Collector for Microsoft SharePoint Online needs the following components to be deployed on SharePoint Online:

IBM® Content Collector app and Restore app

There is a separate installer to install the following apps:

  • IBM Content Collector app: This app adds one content type and three site columns to a SharePoint site collection. This approach ensures that the additional content type and columns are created only where needed.
  • Restore app: This app is used for restoring the document back to SharePoint Online from target repository by communicating with the Link Handler and Restore apps.

Link Handler web application

This web application needs to be deployed on server with Internet Information server (IIS) installed on it.

This handles redirection from shortened URLs to their full lengths. This is necessary because the length of site column URLs in SharePoint is limited to 260 characters.

For restore functionality, this web application / server should be exposed to internet so that it can be accessible for SharePoint Online. SharePoint Online communicates with the web application via restore app deployed on SharePoint Online. It should also be accessible from the IBM Content Collector server because it restores the document using Content Collector.

When using restore functionality the SSL certificate for the domain which is going to exposed over internet needs to be installed with IIS. Follow the steps to install the certificate in IIS at How to Set Up SSL on IIS 7.

Once the certificate is installed, to connect to the website remotely remote access needs to be enabled on IIS. Follow the steps to enable the remote connections at Configuring Remote Administration and Feature Delegation in IIS 7.

The following components on the Content Collector server communicate with the SharePoint Online:

  • The SharePoint Connector

    SharePoint Connector communicates with the SharePoint Online to perform its activities. The SharePoint Discovery component within the SharePoint Connector is responsible for communication with SharePoint Online. The IBM Content Collector SharePoint Connector service is responsible for communication with the Content Collector Task Routing Engine.

  • IBM Content Collector Configuration Manager
    The IBM Content Collector Configuration Manager is the administration interface where the SharePoint connectors, all metadata, and task routes can be configured. To connect to the SharePoint Online, the Configuration Manager uses the SharePoint Client server object Model (CSOM). The Configuration Manager performs the following actions:
    • Validates credentials for the site
    • Retrieves a list of Libraries and supported Lists in a site
    • Retrieves a list of Content Types in a site
    • Retrieves a list of Columns in a site or library

IBM Content Collector Web Application service

The IBM Content Collector Web Application service provides content retrieval from a target repository.

OAuth support

Content Collector for Microsoft SharePoint Online supports OAuth. OAuth is an authorization protocol or a set of rules - that allows a third-party website or application to access a user’s data without the user needing to share login credentials. Content Collector for Microsoft SharePoint Online uses Azure app for authentication.

Azure portal administrator needs to create an app in the Azure Active Directory portal with full rights to SharePoint Online. The script is provided to create the Azure Active Directory app.SharePoint Online OAuth protocol

Hybrid environment

  • Content Collector can have a SharePoint on-premise and SharePoint Online connection configured in the same SharePoint collector.
  • Content Collector can have a SharePoint On-premise and SharePoint Online task routes in the same catalog and both can be active at the same time but the task route will be processed sequentially.
  • Active Directory synced users given SharePoint administrator role in Azure Active Directory can be used for SharePoint Online connection.

Architectural differences

As there is difference between the way the SharePoint is hosted there is difference in architecture for Content Collector for Microsoft SharePoint Online as well compare to Content Collector for Microsoft SharePoint on-premise.SharePoint vs. SharePoint Online architecture

For SharePoint on-premise Content Collector have web service installed on SharePoint server which communicates with SharePoint connector using server APIs.

Whereas in case of SharePoint Online, it has limitation of installing web service on Cloud.

Hence, Content Collector uses client APIs to communicate with SharePoint Online from IBM Content Collector server itself. This is one of the reasons of difference in performance of SharePoint Online and SharePoint on-premise.

Functional differences

  • Retention Labels: Archiving list items from SharePoint Online with Retention labels is different than that of SharePoint on-premise.

    In case of SharePoint on-premise when archiving documents with retention policy, “Modified by” and “Modified date” fields get updated when Content Collector adds Migration information.

    For SharePoint Online, archiving list items with retention labels there is no update to “Modified by” and “Modified date” fields. This is due to the different internal API, that Content Collector needs to use with SharePoint Online for list items having retention labels.

    For more information, refer to the 'Retention labels' section in Learn about retention policies and retention labels.

  • When using replace with link option the stubbed links in SharePoint On-premises have .aspx extension whereas for SharePoint Online stubbed links will have .URL extension.
  • For globalization of IBM content collector APP in SharePoint Online the site collection needs to be created with specific language so that the APP will create language specific content type and columns while registering.
  • Microsoft SharePoint Online has throttling limit for 5000 docs in a view. For details, refer to Manage large lists and libraries.

    Content Collector for Microsoft SharePoint Online supports multiple libraries with more than 5000 docs in total for archiving but if single Library with 5000 or more documents is selected, SharePoint online gives throttling error.

  • Similar to Content Collector for Microsoft SharePoint on-premise, Content Collector for Microsoft SharePoint Online supports Classic blogs sites. New / Modern blog template which is introduced by Microsoft SharePoint Online recently is not supported.