Crawling SharePoint Blogs

Users can create blogs in SharePoint 2007 and higher. In SharePoint, blog posts and blog comments are stored as separate list items, in two distinct lists. Blog posts are stored in the Posts list. The comments for all blog posts are stored in the Comments list. Comments reference the blog post by ID, which is stored in the ows_PostID attribute.

Note: Requires IBM Watson™ Explorer Engine version 8.0 or greater.

To create a more natural search experience, the connector combines blog posts with all of its associated comments into a single virtual document. Blog comments and blog posts are therefore returned as a single search result.

To distinguish between the metadata of the blog post and the metadata of the comments, all comment <content> name attributes are altered to contain Comment_<ID>_, where <ID> is replaced with the ID of the comment. For example, the name of the title for a comment with an ID of 3 is changed from ows_Title to ows_Comment_3_Title. If the name attribute does not start with ows_, then Comment_<ID>_ is prefixed to the attribute name (BaseType becomes Comment_3_BaseType).

By default, the SharePoint converter:

  • Converts the comment body to a snippet.
  • Passes the comment title and author through, but assigns summarize-discard as the output-action. This ensures that blog posts can be found by searching for a comment author or title, but does not pollute the search results with unnecessary metadata such as all other blog comments, comment titles, and authors.
  • Discards all other metadata.