The IBM Content Collector for Microsoft SharePoint product provides collection and archiving of SharePoint content and extends capabilities of SharePoint to leverage IBM Enterprise Content Management (ECM) products. It installs software on Microsoft SharePoint Web Front-end (WFE) servers as well as Content Collector servers.
This article describes the product components that are installed within SharePoint, details the software installation, and discusses what happens during data processing. The intended audience is SharePoint administrators, Content Collector and ECM administrators, and anyone involved in assessing the environment impact of IBM Content Collector for Microsoft SharePoint.
Components of a Content Collector for Microsoft SharePoint implementation
Figure 1 provides an overview of the major components involved in a Content Collector for Microsoft SharePoint implementation, showing their locations and communication. All cross-server communication shown is over HTTP.
Figure 1. Major components of IBM Content Collector for Microsoft SharePoint 2.2
This article covers only the Content Collector components that exist on, or communicate with, a SharePoint server.
Components on SharePoint WFE Servers
A single SharePoint installation deploys the following components to SharePoint WFE servers:
- ICCSPFeature
This SharePoint Feature adds one content type and three site columns to a SharePoint site collection. A feature is used instead of programmatic object creation in order to support localization. By design, the feature is not activated during installation, but is activated automatically during configuration or processing. This approach ensures that the additional content type and columns are created only where needed.
- ICCSPWebService
This is a web service deployed globally in SharePoint. The development team decided on a web service on the SharePoint server for two reasons. The first reason is simply to get the best performance. The second reason is that the SharePoint Web Service API is only a subset of the SharePoint Object Model API, so by choosing the latter any circumstances where something cannot be achieved due to API choice are avoided.
The web service offers these methods:
- DeleteDocument
- GetBlogPostComments
- GetDocumentContent
- GetDocumentMetadata
- GetFilesForFolder
- GetListRelativeUrl
- GetUrlsToWalk
- GetWikiPageContent
- HasDocumentBeenModifiedSince
- LockdownAndMarkDocument
- MarkDocument
- ReplaceDocumentWithLink
- TestWebService
- UpdateDocumentUrl
- ICCSPLinkHandler
This web page handles redirection from shortened URLs to their full lengths. This is necessary due to the URL site column in SharePoint having a limit of 260 characters.
- SharePoint Connector
This is the actual connector, which communicates with the Content Collector SharePoint web service to perform its activities. The SharePoint Discovery component within the SharePoint Connector is responsible for communication with SharePoint. The SharePoint Connector Service is responsible for communication with the task route engine of Content Collector.
- Configuration Manager
The administration application for Content Collector enables the configuration of all connectors, metadata, and task routes. To connect to the SharePoint server, the configuration manager uses the SharePoint web service API where possible, and where not, the Content Collector SharePoint web service (which in turns utilizes the SharePoint Object Model API). Specifically, it:
- validates credentials for site and web service
- retrieves a list of Libraries and supported Lists in a site
- retrieves a list of Content Types in a site
- retrieves a list of Columns in a site or library
- Content Collector Web Services
The Content Collector Web Services provide a variety of functions for the different Content Collector connectors. The primary purpose in a SharePoint connector implementation is to provide transparent content retrieval. In other words, when a user clicks a link document in SharePoint, after the Link Handler has verified the user's access, the content is retrieved from the appropriate ECM repository through the Content Collector Web Services.
The following sections provide detailed information about running the IBM Content Collector for Microsoft SharePoint installer on a SharePoint WFE server.
The installer inspects the Windows registry for a key for either SharePoint 2007 or SharePoint 2010. If neither registry key is found then the user is informed and installation is aborted.
There are no other prerequisites.
The installer performs the following actions:
- Copies files to install destination folder*:
- ICCSPWebService.wsp - solution file
- stsadm.cmd – Console commands for solution add/deploy/retract/remove for SharePoint 2007, and for SharePoint 2010 invokes ICCSPWebService.ps1.
- ICCSPWebService.ps1 – Powershell script for solution add/deploy/retract/remove for SharePoint 2010
- Installer files – the following folders and files are created
by the InstallAnywhere installer:
- jre – folder for Java runtime
- license – folder for translated license files
- Uninstall_IBM Content Collector for Microsoft SharePoint – uninstaller folder
- IBM_Content_Collector_for_Microsoft_SharePoint_InstallLog – install log file
*The default destination folder is:
C:\Program Files\IBM\Content Collector for Microsoft SharePoint
This can be changed during installation. - Runs stsadm.cmd to deploy ICCSPWebService.wsp solution file.
The solution contains:- Manifest.xml – solution manifest
- Web service files:
- ICCSPWebService.asmx
- ICCSPWebServicedisco.aspx
- ICCSPWebServicewsdl.aspx
- ICCSPWebService.dll
- Feature files:
- ContentType.xml
- Feature.xml
- SiteColumns.xml
- 22 language resource files, covering 21 languages
- Link Handler files:
- ICCSPLinkHandler.aspx
The ICCSPWebService.wsp solution is farm-deployment friendly. In other words, you only need to install it on a single SharePoint WFE server, and SharePoint automatically deploys it to the rest of the farm.
"HIVE" is used below as a short-form of the SharePoint hive location, as follows:
which for SharePoint 2007 is:
- For SharePoint 2007, the location is:
C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\12 - For SharePoint 2010, the location is:
C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14
After deployment, SharePoint will have placed the solution files as follows:
- Web Service
ICCSPWebService.dll is deployed to the Global Assembly Cache. All other files are deployed to:
HIVE\ISAPI - Feature
All feature files are deployed to:
HIVE\TEMPLATE\FEATURES\ICCSPFeature - Link Handler
Link Handler page is deployed to:
HIVE\TEMPLATE\LAYOUTS
The user logged on to the SharePoint WFE server to perform the installation must have appropriate permissions (for example, copy files to disk, install and run applications, deploy solutions) to perform all the above actions.
When you configure SharePoint connections on the IBM Content Collector server, the credentials you provide must belong to a user who belongs to the Site Collection Administrators group.
The configuration of the load balancer does not occur during installation on the WFE server, but during installation on the IBM Content Collector server.
During processing, the SharePoint connector on the IBM Content Collector server will make web service calls while performing tasks. Here is a breakdown of which web service method calls each task may invoke.
Table 1. Web service method usage
| Task or activity | Web service methods |
|---|---|
| SP Collector | GetBlogPostComments GetDocumentContent GetDocumentMetadata GetFilesForFolder GetUrlsToWalk GetWikiPageContent |
| SP Get Versions | GetBlogPostComments GetDocumentContent GetDocumentMetadata GetWikiPageContent |
| SP Create File | No web service methods are called. Files are downloaded direct from SharePoint. |
| SP Post-processing | DeleteDocument HasDocumentBeenModifiedSince LockdownAndMarkDocument MarkDocument ReplaceDocumentWithLink |
| SP Manage Link | UpdateDocumentUrl |
| Validate button of either Initial Configuration or the Connection configuration dialog | TestWebService |
The SP Post-processing task has four options, each of which performs different operations, with a distinct processing impact difference overall. Here are the four options, listed in order of typical processing time performance, fastest first.
Table 2. Post-processing options and their actions
| Option | Actions |
|---|---|
| Mark as processed | Tags an item as processed. |
| Delete | Removes an item. |
| Lock down | Tags an item as processed. Makes permission changes to an item. |
| Replace with link | Creates a link document. Mirrors metadata and permission grantees from original item to link document. Removes original item. |
In this article you learned about the architecture and components of IBM Content Collector for Microsoft SharePoint. You gained insights into the installation and processing impacts on your SharePoint environment. With this information in hand you can be better prepared to assess the product impact to your environment and perform a successful implementation.
- Read the documentation in the
IBM Archive and eDiscovery Solution Information Center.
- Get a product overview at the
Content Collector for Microsoft SharePoint website.
- Learn more about IBM's
Content collection and archiving solutions.
- Learn more about Microsoft SharePoint at the
Microsoft SharePoint website.
- In the
ECM Zone
on developerWorks, get the resources you need to advance your skills in
IBM Enterprise Content Management products.

Brent Benton is an engineer in the Software Group at IBM. He has worked in software development and ECM for eleven years for Yaletown Technology Group, FileNet, and IBM. His contributions to ECM and IBM include numerous ECM integration and content migration tools and products, including the design and development of a previous generation of what today is known as Content Collector for File Systems. For the past four years Brent has worked on what today is Content Collector for Microsoft SharePoint. Currently Brent is the development lead for social media connectors for IBM Content Collector.




