Creating primary volumes

A primary volume serves as a primary data source in IBM® StoredIQ®. You must have at least one primary volume within your configuration.

Procedure

Go to Administration > Data sources > Specify volumes > Volumes.
On the Primary volume list page, click Add primary volumes.
Enter the information that is described in the following tables, which are based on your server type.
Individual tables describe the options.
Except for Chatter, Domino®, and Jive volumes, volumes can also be added in IBM StoredIQ Administrator. However, the set of available configuration options slightly varies. For example, settings for the synchronization with a governance catalog can be configured only in IBM StoredIQ Administrator.

Box and OneDrive volumes can be added from IBM StoredIQ Administrator only.
Click OK to save the volume.

Select one of the following options:

Add another volume on the same server.
Add another volume on a different server.
Finished adding volumes.

This table describes the fields that are available in the Add volume dialog box when you configure primary volumes.

Note: Case-sensitivity rules for each server type apply. Red asterisks within the user interface denote the fields.

Table 1. CIFS/SMB, SMB2, or SMB3 (Windows platforms) primary volumes
Field	Value	Notes
Server type	Select CIFS (Windows platform).	SMB, SMB2, and SMB3 are supported. Depending on the setup of your SMB server, some additional SMB configuration might be required on the IBM StoredIQ data server. For details, see Configuring SMB properties. If you want to preserve ownership of objects in Copy or Move actions between CIFS volumes, you can add an admin knob as described in Enabling ownership preservation for objects on CIFS volumes.
Server	Enter the fully qualified name of the server where the volume is available for mounting. If you create a volume for use with Distributed File System (DFS) services, provide the following information: For a domain-based namespace, specify the fully qualified domain name (FQDN) of the server. For a standalone namespace, specify the hostname of the namespace server.	For using DFS services, the `jcifs.smb.client.dfs.disabled` SMB property in the jcifs.properties file must be set to false. For details, see the property description.
Connect as	Enter the logon ID that is used to connect and mount the defined volume. If you create a volume for use with Distributed File System (DFS) services, enter the fully qualified domain name of the server and the user name for connecting and mounting the volume in the format `FQDN\user`.
Password	Enter the password that is used to connect and mount the defined volume.
Volume	Enter the name of the share to be mounted. If you create a volume for use with Distributed File System (DFS) services, enter the DFS namespace.	Data from file or directory symbolic links in a share cannot be harvested.
Initial directory	Enter the name of the initial directory from which the harvest must begin. If you create a volume for use with Distributed File System (DFS) services, enter the folder target.	With this feature, you can select a volume further down the directory tree rather than selecting an entire volume. If you create a volume for use with Distributed File System (DFS) services and want to include all namespace folders, do not specify an initial directory.
Index options	Select either or both of the Index options checkboxes. Include system metadata for data objects within containers. Include content tagging and full-text index.	These options are selected by default.
Validation	To validate volume accessibility, select this option.	When selected (the default state), IBM StoredIQ tests to see whether the volume can be accessed. If you define values for Start directory and End directory, IBM StoredIQ verifies that the objects specified there can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directories exist with names matching the strings you specified.
Include directories	Specify a regular expression for included directories for each harvest (if it was specified).	These directories are defined as sets of "first node" directories, relative to the specified (or implied) starting directory that is considered part of the logical volume.
Start directory	Specify a starting point for the harvest. This involves volume partitioning to break up a large volume. If an initial directory is defined, what you specify as the start directory must be underneath the initial directory. You can specify a file name or a directory name. Harvest starts with the specified file or directory. All files and directories with names in the range determined by Start directory and End directory are harvested. This includes files at root level or, if set, in the initial directory. If validation of volume accessibility is enabled, IBM StoredIQ verifies that the objects specified as Start directory and End directory can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directory exists with a name matching the strings you specified.
End directory	Determine the end point for the harvest. You can specify a file name or a directory name. All files and directories with names in the range determined by Start directory and End directory are harvested. However, what you specify as the end directory is not part of volume partitioning. The harvest is stopped at this file or directory; it is not included in the harvest. If validation of volume accessibility is enabled, IBM StoredIQ verifies that the objects specified as Start directory and End directory can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directory exists with a name matching the strings you specified.
Access Times	Select one of these options: Reset access times but do not synchronize them. (This setting is the default setting.) Do not reset or synchronize access times. Reset and synchronize access times on incremental harvests.
Constraints	Select one of these options: Only use __ connection process (es): Specify a limit for the number of harvest connections to this volume. If the server is also being accessed for attribute and full-text searches, you might want to regulate the load on the server by limiting the harvester processes. The maximum number of harvest processes is automatically shown. This maximum number is set on the system configuration tab. Control the number of parallel data object reads: Designate the number of parallel data object reads. Scope harvests on these volumes by extension: Include or exclude data objects that are based on extension.

Table 2. NFS v2 and v3 primary volumes
Field	Value	Notes
Server type	Select NFS v2 or NFS v3.
Server	Enter the fully qualified name of the server where the volume is available for mounting.
Volume	Enter the name or names of the volume to be mounted.
Initial directory	Enter the name of the initial directory from which the harvest must begin.	With this feature, you can select a volume further down the directory tree rather than selecting an entire volume.
Index options	Select either or both of the Index options checkboxes. Include system metadata for data objects within containers. Include content tagging and full-text index.	These options are selected by default.
Validation	To validate volume accessibility, select this option.	When selected (the default state), IBM StoredIQ tests to see whether the volume can be accessed. If you define values for Start directory and End directory, IBM StoredIQ verifies that the objects specified there can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directories exist with names matching the strings you specified.
Include directories	Specify a regular expression for included directories for each harvest (if it was specified).	These directories are defined as sets of "first node" directories, relative to the specified (or implied) starting directory, that is considered part of the logical volume.
Start directory	Specify a starting point for the harvest. This involves volume partitioning to break up a large volume. If an initial directory is defined, what you specify as the start directory must be underneath the initial directory. You can specify a file name or a directory name. Harvest starts with the specified file or directory. All files and directories with names in the range determined by Start directory and End directory are harvested. This includes files at root level or, if set, in the initial directory. If validation of volume accessibility is enabled, IBM StoredIQ verifies that the objects specified as Start directory and End directory can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directory exists with a name matching the strings you specified.
End directory	Determine the end point for the harvest. You can specify a file name or a directory name. All files and directories with names in the range determined by Start directory and End directory are harvested. However, what you specify as the end directory is not part of volume partitioning. The harvest is stopped at this file or directory; it is not included in the harvest. If validation of volume accessibility is enabled, IBM StoredIQ verifies that the objects specified as Start directory and End directory can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directory exists with a name matching the strings you specified.
Access times	Select one of these options: Reset access times but do not synchronize them. (This setting is the default setting.) Do not reset or synchronize access times. Reset and synchronize access times on incremental harvests.
Constraints	Select one of these options: Only use __ connection process (es): Specify a limit for the number of harvest connections to this volume. If the server is also being accessed for attribute and full-text searches, you might want to regulate the load on the server by limiting the harvester processes. The maximum number of harvest processes is automatically shown. This maximum number is set on the system configuration tab. Control the number of parallel data object reads: Designate the number of parallel data object reads. Scope harvests on these volumes by extension: Include or exclude data objects that are based on extension.

Table 3. Exchange primary volumes
Field	Value	Notes
Server type	Select Exchange.
Version	In the Version list, select the appropriate version. Options include 2000/2003, 2007, 2010/2013/2016, and Online.	Exchange Online volumes require some prerequisite configuration. For more information, see Registering IBM StoredIQ as a Microsoft service application for access to Exchange Online.
Server	Enter the fully qualified name of the server where the volume is available for mounting. Alternatively, you can enter a server alias.	For Exchange primary volumes, it is the fully qualified domain name where the OWA is. Multiple Client Access servers on Exchange 2007 are supported. The server load must be balanced at the IP or DNS level. If you selected Online as the Version option, this field fills automatically with the Exchange Online server name.
Mailbox server	When you configure multiple client access servers, enter the name of one or more mailbox servers, which are separated by a comma.	For Exchange primary volumes, it is the fully qualified domain name where the mailbox to be harvested is. If you selected Online as the Version option, this field is not available.
Active Directory server	Enter the name of the Active Directory server.	It must be a fully qualified Active Directory server. If you selected Online as the Version option, this field is not available.
Authentication URL	This field is prepopulated with the appropriate URL for authenticating with Azure AD. Do not change this value.	This field is available only if you selected Online as the Version option.
Impersonation scope	This field is prepopulated with the management scope for the impersonation. This scope defines the group of accounts for which impersonation is allowed. Do not change this value.	This field is available only if you selected Online as the Version option.
Impersonation account	Enter the user account to use for connecting to Exchange Online. This account must be authorized to impersonate the members of the specified impersonation scope.	This field is available only if you selected Online as the Version option.
Client ID	Enter the application (client) ID under which IBM StoredIQ is registered with Microsoft.	This field is available only if you selected Online as the Version option.
Client secret	Enter the client secret that is associated with the client ID. The values make up the credentials for access to a Microsoft Exchange Online data source.	This field is available only if you selected Online as the Version option.
Protocol	To use SSL, select the Protocol checkbox.	If you selected Online as the Version option, the Use SSL checkbox is automatically selected, and this field cannot be edited.
Connect as	Enter the logon ID that is used to connect and mount the defined volume.	If you selected Online as the Version option, this field is not available.
Password	Enter the password that is used to connect and mount the defined volume.	If you selected Online as the Version option, this field is not available.
Volume	Enter the name or names of the volume to be mounted.	For Exchange, enter a friendly name for the volume.
Folder	Select either of the Mailboxes or Public folders options.
Initial directory	Enter the name of the initial directory from which the harvest must begin.	For Exchange, this field must be left blank if you are harvesting all mailboxes. If you are harvesting a single mailbox, enter the email address for that mailbox.
Virtual root	The name defaults to the correct endpoint for the selected Exchange version.
Personal archives	Select Harvest personal archive to harvest personal archives.	This checkbox is available only when Exchange 2010/2013/2016 or Online is selected.
Index options	Select either or both of the Index options checkboxes. Include system metadata for data objects within containers. Include content tagging and full-text index.	These options are selected by default.
Remove journal envelope		When selected, the journal envelope is removed.
Validation	To validate volume accessibility, select this option.	When selected (the default state), IBM StoredIQ tests to see whether the volume can be accessed.
Include directories	Specify a regular expression for included directories for each harvest (if it was specified).	These directories are defined as sets of "first node" directories, relative to the specified (or implied) starting directory that is considered part of the logical volume.
Start directory	Use Start directory and End directory to scope the harvest to a date range. Specify the start of the date range in the format `YYYY-MM-DD`.	The date range is relative to the initial directory. For example, you can create an Exchange volume for the user John Doe's mailbox by setting that mailbox as initial directory and then limit the harvest to mails in this mailbox within the date range defined by the start and end directories.
End directory	Use Start directory and End directory to scope the harvest to a date range. Specify the end of the date range in the format `YYYY-MM-DD`. Harvesting stops at the first email with this date stamp.
Constraints	Select one of these options: Only use __ connection process (es): Specify a limit for the number of harvest connections to this volume. If the server is also being accessed for attribute and full-text searches, you might want to regulate the load on the server by limiting the harvester processes. The maximum number of harvest processes is automatically shown. This maximum number is set on the system configuration tab. Control the number of parallel data object reads: Designate the number of parallel data object reads.

Table 4. SharePoint primary volumes.
Prerequisites: For SharePoint volume prerequisites and configuration information, see Configuration of SharePoint and Special note about SharePoint volumes.
Field	Value	Notes
Server type	SharePoint	Required.
Version	Select one of these servers: 2003, 2007, 2010, 2013, 2016, or Online.	Required.
Server	The fully qualified name of the SharePoint server.	Required. When you add SharePoint volumes that contain spaces in the URL, see Special Note: Adding SharePoint Volumes.
Active Directory server	The name of the Active Directory server.	Optional. Specify the fully qualified Active Directory server name. This option is not available for SharePoint Online.
Protocol	Select Use SSL only if SSL is enabled for this SharePoint server. For SharePoint Online, this option is automatically selected and cannot be edited.	Optional. If SSL is enabled on the SharePoint server and you do not select this option, no volume is created and the HTTP status code `301 Moved Permanently` is returned. To fix the issue, select the option. If SSL is not enabled on the SharePoint server and you select this option, no volume is created and the socket error `[Errno 111] Connection Refused` is returned. To fix this issue, clear the Use SSL checkbox.
Connect as	Enter the name of a user with the required permissions for that site collections. Use the following syntax: SharePoint Online: `userid@Microsoft_cloudname.com` Other SharePoint versions: `Active Directory Domain Name\username`	Required. Use a site collection administrator account. No volume can be added if the validation of the credentials fails, which can happen for the following reasons: The user does not exist or does not have the required permissions. The password is not correct. The HTTP status code is usually `401 Unauthorized`. However, for SharePoint Online, the HTTP status code `400 Bad Request` is returned for insufficient permissions.
Password	Enter the password for the user specified in Connect as.	Required.
Volume	Enter the URL of the SharePoint site collection, for example: /portal/site	Required. Do not include the SharePoint server name in the URL, otherwise the URL cannot be located on the server and thus no volume is created. When you add SharePoint volumes that contain spaces in the URL, see Special Note: Adding SharePoint Volumes.
Initial directory	Enter the name of the subsite from which you want the harvest to start.	Optional. With this feature, you can select a volume further down the directory tree rather than selecting an entire volume. When you add SharePoint volumes that contain spaces in the URL, see Special Note: Adding SharePoint Volumes.
Index options	Select either or both of the Index options checkboxes. Include system metadata for data objects within containers Include content tagging and full-text index	Both options are selected by default. Tip: Leave the Include metadata for contained objects checkbox selected to have metadata for objects in containers added to the metadata index. To avoid creating a full-text index for the entire volume, clear the Include content tagging and full-text index checkbox. Create a full-text index for a subset of data later by running a Step-up Analytics action. For SharePoint Online, full-text indexing of OneNote notebook objects, that is, Notes®, is not supported currently. FSMD-based searches for these files are supported.
Subsites	To check all sites and subsites of the site collection for data objects, select Recurse into subsites.	Optional.
Versions	To harvest all document versions, select Include all versions.	Optional. IBM StoredIQ supports indexing versions from SharePoint. For more information, see Special Note: Adding SharePoint Volumes.
Validation	To validate volume accessibility, select this option.	When selected (the default state), IBM StoredIQ tests to see whether the volume can be accessed. If you define values for Start directory and End directory, IBM StoredIQ verifies that the objects specified there can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directories exist with names matching the strings you specified.
Include directories	Specify a regular expression for included directories for each harvest (if it was specified).	These directories are defined as sets of "first node" directories, relative to the specified (or implied) starting directory, that is considered part of the logical volume.
Start directory	Specify a starting point for the harvest. This involves volume partitioning to break up a large volume. If an initial directory is defined, what you specify as the start directory must be underneath the initial directory. You can specify a file name or a directory name. Harvest starts with the specified file or directory. All files and directories with names in the range determined by Start directory and End directory are harvested. This includes files at root level or, if set, in the initial directory. If validation of volume accessibility is enabled, IBM StoredIQ verifies that the objects specified as Start directory and End directory can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directory exists with a name matching the strings you specified.
End directory	Determine the end point for the harvest. You can specify a file name or a directory name. All files and directories with names in the range determined by Start directory and End directory are harvested. However, what you specify as the end directory is not part of volume partitioning. The harvest is stopped at this file or directory; it is not included in the harvest. If validation of volume accessibility is enabled, IBM StoredIQ verifies that the objects specified as Start directory and End directory can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directory exists with a name matching the strings you specified.
Constraints	Select one of these options: Only use __ connection process (es): Specify a limit for the number of harvest connections to this volume. If the server is also being accessed for attribute and full-text searches, you might want to regulate the load on the server by limiting the harvester processes. The maximum number of harvest processes is automatically shown. This maximum number is set on the system configuration tab. Control the number of parallel data object reads: Designate the number of parallel data object reads.

Table 5. Documentum primary volumes.
Prerequisites: Before you can add Documentum volumes, you must add the Documentum server. For more information, see Adding a Documentum server as a data source.
Field	Value	Notes
Server type	Select Documentum.	For Documentum, you must specify the doc broker. See Configuring Documentum.
Doc base	Enter the name of the Documentum repository.	A Documentum repository contains cabinets, and cabinets contain folders and documents.
Connect as	Enter the logon ID that is used to connect and mount the defined volume.
Password	Enter the password that is used to connect and mount the defined volume.
Volume	Enter the name or names of the volume to be mounted.	For Documentum, enter a friendly name for the volume.
Harvest	To enable harvesting all document versions, select Harvest all document versions.	Important: If you do not select this option for the initial harvest, changing the setting later does not have an effect when the volume is reharvested. As a workaround, create a new volume and ensure that the Harvest all document versions option is set before you start harvesting.
Initial directory	Enter the name of the initial directory from which the harvest must begin.	With this feature, you can select a volume further down the directory tree rather than selecting an entire volume.
Index options	Select either or both of the Index options checkboxes. Include system metadata for data objects within containers. Include content tagging and full-text index.	These options are selected by default.
Validation	To validate volume accessibility, select this option.	When selected (the default state), IBM StoredIQ tests to see whether the volume can be accessed.
Include directories	Specify a regular expression for included directories for each harvest (if it was specified).	These directories are defined as sets of "first node" directories, relative to the specified (or implied) starting directory that is considered part of the logical volume.
Constraints	Select one of these options: Only use __ connection process (es): Specify a limit for the number of harvest connections to this volume. If the server is also being accessed for attribute and full-text searches, you might want to regulate the load on the server by limiting the harvester processes. The maximum number of harvest processes is automatically shown. This maximum number is set on the system configuration tab. Control the number of parallel data object reads: Designate the number of parallel data object reads.

Table 6. Domino primary volumes
Field	Value	Notes
Server type	Select Domino.	For Domino, you must first upload at least one user.id. See Adding Domino as a Primary Volume.
Server	Enter the fully qualified name of the server where the volume is available for mounting. It can happen that the data server cannot find the Domino server based on this information because the DNS resolution fails. In this case, ping the Domino server. If the ping is successful, add an entry to the data server's /etc/hosts file. The entry must consist of the common name portion of the Domino server name (omit the domain portion) and the IP address associated with the Domino server, for example: `198.51.100.0 NALLN999`	For Domino, select the appropriate user name, which was entered with the Configuration subtab in the Lotus Notes® user administration area.
Connect as	Enter the logon ID that is used to connect and mount the defined volume.	For Domino, select the user name for the primary user ID. The user ID must be configured on the System Configuration screen under the Lotus Notes user administration link.
Password	Enter the password that is used to connect and mount the defined volume.	For Domino, enter the password for the primary user ID.
Volume	Enter the name or names of the volume to be mounted.	For Domino, enter a friendly name for the volume.
Harvest	To harvest mailboxes, select the Harvest mailboxes option. To harvest mail journals, select the Harvest mail journals option. To harvest all applications, select the Harvest all applications option.	This option obtains the list of all known Domino users and their NSFs. It then harvests those mailboxes unless it was pointed to a single mailbox with the initial directory.
Initial directory	Enter the name of the initial directory.
Index options	Select either or both of the Index options checkboxes. Include system metadata for data objects within containers. Include content tagging and full-text index.	These options are selected by default.
Validation	To validate volume accessibility, select this option.	When selected (the default state), IBM StoredIQ tests to see whether the volume can be accessed. If you define values for Start directory and End directory, IBM StoredIQ verifies that the objects specified there can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directories exist with names matching the strings you specified.
Include directories	Specify a regular expression for included directories for each harvest (if it was specified).	These directories are defined as sets of "first node" directories, relative to the specified (or implied) starting directory that is considered part of the logical volume.
Start directory	Specify a starting point for the harvest. This involves volume partitioning to break up a large volume. If an initial directory is defined, what you specify as the start directory must be underneath the initial directory. You can specify a file name or a directory name. Harvest starts with the specified file or directory. All files and directories with names in the range determined by Start directory and End directory are harvested. This includes files at root level or, if set, in the initial directory. If validation of volume accessibility is enabled, IBM StoredIQ verifies that the objects specified as Start directory and End directory can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directory exists with a name matching the strings you specified.
End directory	Determine the end point for the harvest. You can specify a file name or a directory name. All files and directories with names in the range determined by Start directory and End directory are harvested. However, what you specify as the end directory is not part of volume partitioning. The harvest is stopped at this file or directory; it is not included in the harvest. If validation of volume accessibility is enabled, IBM StoredIQ verifies that the objects specified as Start directory and End directory can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directory exists with a name matching the strings you specified.
Constraints	Select one of these options: Only use __ connection process (es): Specify a limit for the number of harvest connections to this volume. If the server is also being accessed for attribute and full-text searches, you might want to regulate the load on the server by limiting the harvester processes. The maximum number of harvest processes is automatically shown. This maximum number is set on the system configuration tab. Control the number of parallel data object reads: Designate the number of parallel data object reads.

Table 7. FileNet primary volumes
Field	Value	Notes
Server type	Select FileNet.	Within IBM StoredIQ Data Server, the FileNet® domain must be configured before any FileNet volumes are created. See Configuring FileNet.
FileNet config	Select the FileNet server you would like to use for this configuration.	For more information, see Configuring FileNet.
Connect as	Enter the logon ID that is used to connect and mount the defined volume.
Password	Enter the password that is used to connect and mount the defined volume.
Domain	Domain name automatically populates.
Object store	Select an object store.	The object store must exist before you create a FileNet primary volume.
Volume	Enter the name or names of the volume to be mounted.
Index options	Select either or both of the Index options checkboxes. Include system metadata for data objects within containers. Include content tagging and full-text index.	These options are selected by default.
Constraints	Select one of these options: Only use __ connection process (es): Specify a limit for the number of harvest connections to this volume. If the server is also being accessed for attribute and full-text searches, you might want to regulate the load on the server by limiting the harvester processes. The maximum number of harvest processes is automatically shown. This maximum number is set on the system configuration tab. Control the number of parallel data object reads: Designate the number of parallel data object reads.

Table 8. NewsGator primary volumes
Field	Value	Notes
Server type	Select NewsGator.
Server	Enter the fully qualified name of the server where the volume is available for mounting.
Protocol	To use SSL, select the Protocol checkbox.
Connect as	Enter the logon ID that is used to connect and mount the defined volume.
Password	Enter the password that is used to connect and mount the defined volume.
Volume	Enter the name or names of the volume to be mounted.	Enter a friendly name for the volume.
Index options	Select either or both of the Index options checkboxes. Include system metadata for data objects within containers. Include content tagging and full-text index.	These options are selected by default.
Validation	To validate volume accessibility, select this option.	When selected (the default state), IBM StoredIQ tests to see whether the volume can be accessed.

Table 9. Livelink primary volumes.
Prerequisite: A copy of the lapi.jar file from a Livelink API installation must be available in the /usr/local/IBM/ICI/vendor directory on each data server in your deployment. Usually, you can find this file on the Livelink server in the C:\OPENTEXT\application\WEB-INF\lib directory. However, the path might be different in your Livelink installation.
Field	Value	Notes
Server type	Select Livelink.
Server	Enter the fully qualified name of the server where the volume is available for mounting.
Port	Enter the port number to be used.
Database	Enter the name of the database.
Search slice	Enter the name of the search slice.
Connect as	Enter the logon ID that is used to connect and mount the defined volume.
Password	Enter the password that is used to connect and mount the defined volume.
Volume	Enter the name or names of the volume to be mounted.
Initial directory	Enter the search slice and the name of the initial directory from which the harvest must begin.	With this feature, you can select a volume further down the directory tree rather than selecting an entire volume.
Index options	Select either or both of the Index options checkboxes. Include system metadata for data objects within containers. Include content tagging and full-text index.	These options are selected by default.
Validation	To validate volume accessibility, select this option.	When selected (the default state), IBM StoredIQ tests to see whether the volume can be accessed. If you define values for Start directory and End directory, IBM StoredIQ verifies that the objects specified there can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directories exist with names matching the strings you specified.
Include directories	Specify a regular expression for included directories for each harvest (if it was specified).	These directories are defined as sets of "first node" directories, relative to the specified (or implied) starting directory that is considered part of the logical volume.
Start directory	Specify a starting point for the harvest. This involves volume partitioning to break up a large volume. If an initial directory is defined, what you specify as the start directory must be underneath the initial directory. You can specify a file name or a directory name. Harvest starts with the specified file or directory. All files and directories with names in the range determined by Start directory and End directory are harvested. This includes files at root level or, if set, in the initial directory. If validation of volume accessibility is enabled, IBM StoredIQ verifies that the objects specified as Start directory and End directory can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directory exists with a name matching the strings you specified.
End directory	Determine the end point for the harvest. You can specify a file name or a directory name. All files and directories with names in the range determined by Start directory and End directory are harvested. However, what you specify as the end directory is not part of volume partitioning. The harvest is stopped at this file or directory; it is not included in the harvest. If validation of volume accessibility is enabled, IBM StoredIQ verifies that the objects specified as Start directory and End directory can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directory exists with a name matching the strings you specified.
Constraints	Select one of these options: Only use __ connection process (es): Specify a limit for the number of harvest connections to this volume. If the server is also being accessed for attribute and full-text searches, you might want to regulate the load on the server by limiting the harvester processes. The maximum number of harvest processes is automatically shown. This maximum number is set on the system configuration tab. Control the number of parallel data object reads: Designate the number of parallel data object reads. Scope harvests on these volumes by extension: Designate which data objects must be harvested by entering those object extensions.

Table 10. Jive primary volumes
Field	Value	Notes
Server type	Select Jive.
Server	Enter the fully qualified name of the server where the volume is available for mounting.
Protocol	To use SSL, select the Protocol checkbox.
Connect as	Enter the logon ID that is used to connect and mount the defined volume.
Password	Enter the password that is used to connect and mount the defined volume.
Volume	Enter the name or names of the volume to be mounted.
Initial directory	Enter the name of the initial directory from which the harvest must begin.	With this feature, you can select a volume further down the directory tree rather than selecting an entire volume.
Index options	Select either or both of the Index options checkboxes. Include system metadata for data objects within containers. Include content tagging and full-text index.	These options are selected by default.
Versions	To harvest all document versions, select Include all versions.
Validation	To validate volume accessibility, select this option.	When selected (the default state), IBM StoredIQ tests to see whether the volume can be accessed. If you define values for Start directory and End directory, IBM StoredIQ verifies that the objects specified there can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directories exist with names matching the strings you specified.
Include directories	Specify a regular expression for included directories for each harvest (if it was specified).	These directories are defined as sets of "first node" directories, relative to the specified (or implied) starting directory that is considered part of the logical volume.
Start directory	Specify a starting point for the harvest. This involves volume partitioning to break up a large volume. If an initial directory is defined, what you specify as the start directory must be underneath the initial directory. You can specify a file name or a directory name. Harvest starts with the specified file or directory. All files and directories with names in the range determined by Start directory and End directory are harvested. This includes files at root level or, if set, in the initial directory. If validation of volume accessibility is enabled, IBM StoredIQ verifies that the objects specified as Start directory and End directory can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directory exists with a name matching the strings you specified.
End directory	Determine the end point for the harvest. You can specify a file name or a directory name. All files and directories with names in the range determined by Start directory and End directory are harvested. However, what you specify as the end directory is not part of volume partitioning. The harvest is stopped at this file or directory; it is not included in the harvest. If validation of volume accessibility is enabled, IBM StoredIQ verifies that the objects specified as Start directory and End directory can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directory exists with a name matching the strings you specified.
Constraints	Select one of these options: Only use __ connection process (es): Specify a limit for the number of harvest connections to this volume. If the server is also being accessed for attribute and full-text searches, you might want to regulate the load on the server by limiting the harvester processes. The maximum number of harvest processes is automatically shown. This maximum number is set on the system configuration tab. Control the number of parallel data object reads: Designate the number of parallel data object reads.

Table 11. Chatter primary volumes
Field	Value	Notes
Server type	Select Chatter.	For Chatter, see Configuring Chatter messages.
Connect as	Enter the logon ID that is used to connect and mount the defined volume.
Password	Enter the password that is used to connect and mount the defined volume.
Auth token	Enter the token that is used to authenticate the Chatter volume.	The auth token must match the user name that is used in the Connect as field. Auth tokens can be generated online on Salesforce. See Configuring chatter messages.
Volume	Enter the name or names of the volume to be mounted.
Initial directory	Enter the name of the initial directory from which the harvest must begin.	With this feature, you can select a volume further down the directory tree rather than selecting an entire volume.
Index options	Select either or both of the Index options checkboxes. Include system metadata for data objects within containers. Include content tagging and full-text index.	These options are selected by default.
Validation	To validate volume accessibility, select this option.	When selected (the default state), IBM StoredIQ tests to see whether the volume can be accessed. If you define values for Start directory and End directory, IBM StoredIQ verifies that the objects specified there can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directories exist with names matching the strings you specified.
Include directories	Specify a regular expression for included directories for each harvest (if it was specified).	These directories are defined as sets of "first node" directories, relative to the specified (or implied) starting directory, that is considered part of the logical volume.
Start directory	Specify a starting point for the harvest. This involves volume partitioning to break up a large volume. If an initial directory is defined, what you specify as the start directory must be underneath the initial directory. You can specify a file name or a directory name. Harvest starts with the specified file or directory. All files and directories with names in the range determined by Start directory and End directory are harvested. This includes files at root level or, if set, in the initial directory. If validation of volume accessibility is enabled, IBM StoredIQ verifies that the objects specified as Start directory and End directory can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directory exists with a name matching the strings you specified.
End directory	Determine the end point for the harvest. You can specify a file name or a directory name. All files and directories with names in the range determined by Start directory and End directory are harvested. However, what you specify as the end directory is not part of volume partitioning. The harvest is stopped at this file or directory; it is not included in the harvest. If validation of volume accessibility is enabled, IBM StoredIQ verifies that the objects specified as Start directory and End directory can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directory exists with a name matching the strings you specified.
Constraints	Select one of these options: Only use __ connection process (es): Specify a limit for the number of harvest connections to this volume. If the server is also being accessed for attribute and full-text searches, you might want to regulate the load on the server by limiting the harvester processes. The maximum number of harvest processes is automatically shown. This maximum number is set on the system configuration tab. Control the number of parallel data object reads: Designate the number of parallel data object reads.

Table 12. IBM Content Manager primary volumes
Field	Value	Notes
Server type	Select IBM Content Manager.
Server	Enter the fully qualified host name of the library server database.
Port	Enter the port that is used to access the library server database.
Repository	Enter the name of the library server database.
Database type	Select the type of database that is associated with the volume. Options include DB2 and Oracle. By default, DB2 is selected.
Schema	Enter the schema of the library server database.
Remote database	Enter the name of the remote database.	Optional.
Connection String	Enter any additional parameters.	Optional.
Harvest itemtype	Enter the name of the item types to be harvested, separated by commas.	Required.
Copy to itemtype	Only `SiqDocument` is supported.	For more information, see IBM Content Manager attributes.
Connect as	Enter the logon ID that is used to connect and mount the defined volume. Note: This Content Manager user ID must have access to all documents to be able to create documents in the `SiqDocument` item type for copy to. If the `SiqDocument` item type does not exist, a Content Manager administration ID must be used, as this ID creates the item type when the volume is created.
Password	Enter the password that is used to connect and mount the defined volume.
Volume	Enter the name or names of the volume to be mounted.
Index options	Select either or both of the Index options checkboxes. Include system metadata for data objects within containers. Include content tagging and full-text index.	Both options are selected by default.
Validation	To validate volume accessibility, select Validation.	When selected (the default state), IBM StoredIQ tests to see whether the volume can be accessed.
Constraints	Select one of these options: Only use __ connection process (es): Specify a limit for the number of harvest connections to this volume. If the server is also being accessed for attribute and full-text searches, you might want to regulate the load on the server by limiting the harvester processes. The maximum number of harvest processes is automatically shown. This maximum number is set on the system configuration tab. Control the number of parallel data object reads: Designate the number of parallel data object reads. Scope harvests on these volumes by extension: Designate which data objects must be harvested by entering those object extensions.

Table 13. CMIS primary volumes
Field	Value	Notes
Server type	Select CMIS.
Server	Enter the fully qualified name of the server where the volume is available for mounting.
Port	Enter the name of the port.
Repository	Enter the name of the repository.
Service	In the Service text box, enter the name of the service.
Protocol	To use SSL, select the Protocol checkbox.
Connect as	Enter the logon ID that is used to connect and mount the defined volume.
Password	Enter the password that is used to connect and mount the defined volume.
Volume	Enter the name or names of the volume to be mounted.
Index options	Select either or both of the Index options checkboxes. Include system metadata for data objects within containers. Include content tagging and full-text index.	These options are selected by default.
Validation	To validate volume accessibility, select this option.	When selected (the default state), IBM StoredIQ tests to see whether the volume can be accessed.
Constraints	Select one of these options: Only use __ connection process (es): Specify a limit for the number of harvest connections to this volume. If the server is also being accessed for attribute and full-text searches, you might want to regulate the load on the server by limiting the harvester processes. The maximum number of harvest processes is automatically shown. This maximum number is set on the system configuration tab. Control the number of parallel data object reads: Designate the number of parallel data object reads.

Table 14. HDFS primary volumes
Field	Value	Notes
Server type	Select HDFS.	Required.
Server	Enter the host name or IP address.	Required.
Port	Enter the port number.
Repository	Enter the name of the repository.
Option string	This option is supported: VerifyCertificate=True.	This option is used to indicate that the validity of the HDFS server's SSL certificate is verified when SSL is used. Values are True, False, or default value. If no value is specified, value is False. To validate the certificate on the HDFS server, the user needs to specify this option and set the value to True.
Protocol	To use SSL, select the Protocol checkbox.
Connect as	Enter the logon ID that is used to connect and mount the defined volume.
Password	Enter the password that is used to connect and mount the defined volume.	Authentication to HDFS is not supported. If your HDFS server requires a password, StoredIQ is not able to connect to it.
Volume	Enter the name or names of the volume to be mounted.
Initial directory	Enter the name of the initial directory from which the harvest must begin.	To avoid the interrogator timeout issue, see Note at the end of this table.
Index options	Select either or both of the Index options checkboxes. Include system metadata for data objects within containers. Include content tagging and full-text index.	These options are selected by default.
Validation	To validate volume accessibility, select this option.	If you define values for Start directory and End directory, IBM StoredIQ verifies that the objects specified there can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directories exist with names matching the strings you specified.
Include directories
Start directory	Specify a starting point for the harvest. This involves volume partitioning to break up a large volume. If an initial directory is defined, what you specify as the start directory must be underneath the initial directory. You can specify a file name or a directory name. Harvest starts with the specified file or directory. All files and directories with names in the range determined by Start directory and End directory are harvested. This includes files at root level or, if set, in the initial directory. If validation of volume accessibility is enabled, IBM StoredIQ verifies that the objects specified as Start directory and End directory can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directory exists with a name matching the strings you specified.
End directory	Determine the end point for the harvest. You can specify a file name or a directory name. All files and directories with names in the range determined by Start directory and End directory are harvested. However, what you specify as the end directory is not part of volume partitioning. The harvest is stopped at this file or directory; it is not included in the harvest. If validation of volume accessibility is enabled, IBM StoredIQ verifies that the objects specified as Start directory and End directory can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directory exists with a name matching the strings you specified.
Constraints	Select one of these options: Only use __ connection process (es): Specify a limit for the number of harvest connections to this volume. If the server is also being accessed for attribute and full-text searches, you might want to regulate the load on the server by limiting the harvester processes. The maximum number of harvest processes is automatically shown. This maximum number is set on the system configuration tab. Control the number of parallel data object reads: Designate the number of parallel data object reads. Scope harvests on these volumes by extension: Designate which data objects must be harvested by entering those object extensions.

Note: If you harvest HDFS volumes with many files in a directory, then an interrogator timeout might occur resulting in a Skipped directory exception in the harvest audit. HDFS responds to StoredIQ slowly when it handles large directories and processes more responses from HDFS. The slow response from HDFS is caused by high CPU usage on HDFS NameNode. Therefore, if interrogator timeout occurs and high CPU usage on the HDFS server is observed, you can allocate more CPU resources to the HDFS server.

To avoid the interrogator timeout issue, you can also limit the file number to 250,000 files in a directory. Since each directory has its own timeout, having fewer files in a single directory ensures efficient operation. Splitting large directories into many small ones also helps resolve the interrogator timeout issues. For example, 1,000,000 files that are equally distributed into 10 directories have fewer risks of timeouts than if they are in one directory.

Table 15. Connections primary volumes
Field	Value	Notes
Server Type	Select Connections in the Server type list.	Required
Server	Enter the fully qualified domain name of the server from which the volume is available for mounting.	Required
Class name	Enter `deepfile.fs.template. impl.ibmconnections. ibmconnectionsconn. IBMConnections`	Required
Repository name	Enter `deepfile.fs.template. impl.ibmconnections. ibmconnectionsconn`	Required
Option string
Connect as	Enter the user name of the account that is set up with admin and search-admin privileges on the Connections server.	Required
Password	Enter the password of the account that is set up with admin and search-admin privileges on the Connections server.	Required
Volume	Enter any name.	Required
Initial Directory
Index options	Select either or both of the Index options checkboxes. Include system metadata for data objects within containers. Include content tagging and full-text index.	These options are selected by default.
Volume	Enter the name or names of the volume to be mounted.
Validation	To validate volume accessibility, select this option.	When selected (the default state), IBM StoredIQ tests to see whether the volume can be accessed. If you define values for Start directory and End directory, IBM StoredIQ verifies that the objects specified there can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directories exist with names matching the strings you specified.
Include directories	Specify a regular expression for included directories for each harvest (if it was specified).	These directories are defined as sets of "first node" directories, relative to the specified (or implied) starting directory that is considered part of the logical volume.
Start directory	Specify a starting point for the harvest. This involves volume partitioning to break up a large volume. If an initial directory is defined, what you specify as the start directory must be underneath the initial directory. You can specify a file name or a directory name. Harvest starts with the specified file or directory. All files and directories with names in the range determined by Start directory and End directory are harvested. This includes files at root level or, if set, in the initial directory. If validation of volume accessibility is enabled, IBM StoredIQ verifies that the objects specified as Start directory and End directory can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directory exists with a name matching the strings you specified.
End directory	Determine the end point for the harvest. You can specify a file name or a directory name. All files and directories with names in the range determined by Start directory and End directory are harvested. However, what you specify as the end directory is not part of volume partitioning. The harvest is stopped at this file or directory; it is not included in the harvest. If validation of volume accessibility is enabled, IBM StoredIQ verifies that the objects specified as Start directory and End directory can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directory exists with a name matching the strings you specified.
Access Times	Select one of these options: Reset access times but do not synchronize them. (This setting is the default setting.) Do not reset or synchronize access times. Reset and synchronize access times on incremental harvests.
Constraints	Select one of these options: Only use __ connection process (es): Specify a limit for the number of harvest connections to this volume. If the server is also being accessed for attribute and full-text searches, you might want to regulate the load on the server by limiting the harvester processes. The maximum number of harvest processes is automatically shown. This maximum number is set on the system configuration tab. Control the number of parallel data object reads: Designate the number of parallel data object reads. Scope harvests on these volumes by extension: Include or exclude data objects that are based on extension.