This document describes Windows Remote File System Crawler with Watson Explorer
Foundational Components. Watson Explorer has SMB Fileshares Connector to crawl files via SMB
protocol. However, SMB Fileshares Connector support SMB v1. So, if files are shared via SMB v2 or
SMB v3, SMB Fileshares Connector cannot crawl the files. To crawl files shared via any version of
SMB, Windows Remote File System Crawler is introduced.
Before you begin
Applies to Watson Explorer 11.0.2.1 and higher. Microsoft Windows only.
To get the ACLs of files correctly, the Windows system where the Watson Explorer Foundational
Component engine is working needs to join the Windows Domain. This is the same as existing SMB
Fileshare Connector.
Note: Windows Remote File System Crawler internally calls Windows API to connect remote file
system, then crawl files like crawling local files. It is very similar to performing a “Map network
drive”, then crawl the files on the drive. So, the version of SMB and other settings of SMB depends
on the setting of Windows. For example, you can specify the version of SMB by the setting of
Windows.
About this task
In Watson Explorer 11.0.2.3 and higher, Windows Remote File System Crawler is listed in the
Add a new seed dialog. By default, filesystem metadata is not crawled. Set
Crawl Filesystem Metadata to true to cause the crawler
to get the available filesystem metadata (creation date, last modified date, file attributes, etc)
about the file. The filesystem metadata needs to be added to the virtual document by the Windows
Filesystem Metadata converter.
Windows Remote File System Crawler is not enabled by default in Watson Explorer 11.0.2.1 and
11.0.2.2. To enable it, follow the steps below.
Procedure
- Access the Watson Explorer Engine administration tool from a web browser.
- Open the XML for the vse-crawler-seed-files function.
- Duplicate the XML and click the edit button
- Edit the XML to replace after <prototype> tag as follows:
<prototype>
<label>Windows Remote Files</label>
<description>
Crawl folders and their sub-directories on remote Windows file system.
<p />
</description>
<declare name="files" type="separated-set" type-separator=" " required="required">
</declare>
<declare required="required" name="username" type="string">
<label name="label">Username</label>
<description name="description">
The username to use to access the Windows file system.
</description>
</declare>
<declare required="required" name="password" type="password">
<label name="label">Password</label>
<description name="description">
The password to use to access the Windows file system.
</description>
</declare>
</prototype>
<process-xsl>
<![CDATA[
<xsl:param name="files"/>
<xsl:template match="/">
<xsl:variable name="names" select="str:tokenize($files, ' ')"/>
<xsl:variable name="fixed-urls">
<xsl:for-each select="$names">
<xsl:value-of select="viv:if-else(starts-with(., 'file:'), ., viv:url-normalize(viv:url-build('file', '', '', '', -1, viv:if-else(starts-with(., '/'), ., concat('/', .)), '')))"/>
<xsl:text> </xsl:text>
</xsl:for-each>
</xsl:variable>
<call-function name="vse-crawler-seed-urls-common">
<with name="urls" value="{$fixed-urls}" />
<with name="how" value="path" />
<with name="username" copy-of="username" />
<with name="password" copy-of="password" />
</call-function>
</xsl:template>
]]>
</process-xsl>
</function>
- Create Collection.
- Click . Click Add a seed.
- Choose Windows Remote Files.
- Fill in the file path, username, and password. The username and password should have privileges
to mount the file system. The username should be specified with the information where the user
belongs, such as domain\user1 instead
of just user1. The file path should be specified as \<File
Server>\<Shared_Directory>.
- Start crawling.
Note: When the username is changed after crawling, reboot the server. Otherwise, the crawling will
be done with the old username.