Creating primary volumes

A primary volume serves as a primary data source in IBM® StoredIQ®. You must have at least one primary volume within your configuration.

Procedure

  1. Go to Administration > Data sources > Specify volumes > Volumes.
  2. On the Primary volume list page, click Add primary volumes.
  3. Enter the information that is described in the following tables, which are based on your server type.
    Individual tables describe the options.

    Except for Chatter, Domino®, and Jive volumes, volumes can also be added in IBM StoredIQ Administrator. However, the set of available configuration options slightly varies. For example, settings for the synchronization with a governance catalog can be configured only in IBM StoredIQ Administrator.

    Box and OneDrive volumes can be added from IBM StoredIQ Administrator only.

  4. Click OK to save the volume.
  5. Select one of the following options:
    • Add another volume on the same server.
    • Add another volume on a different server.
    • Finished adding volumes.

    This table describes the fields that are available in the Add volume dialog box when you configure primary volumes.

    Note: Case-sensitivity rules for each server type apply. Red asterisks within the user interface denote the fields.
    Table 1. CIFS/SMB, SMB2, or SMB3 (Windows platforms) primary volumes
    Field Value Notes
    Server type Select CIFS (Windows platform). SMB, SMB2, and SMB3 are supported. Depending on the setup of your SMB server, some additional SMB configuration might be required on the IBM StoredIQ data server. For details, see Configuring SMB properties.

    If you want to preserve ownership of objects in Copy or Move actions between CIFS volumes, you can add an admin knob as described in Enabling ownership preservation for objects on CIFS volumes.

    Server Enter the fully qualified name of the server where the volume is available for mounting.
    If you create a volume for use with Distributed File System (DFS) services, provide the following information:
    • For a domain-based namespace, specify the fully qualified domain name (FQDN) of the server.
    • For a standalone namespace, specify the hostname of the namespace server.
    For using DFS services, the jcifs.smb.client.dfs.disabled SMB property in the jcifs.properties file must be set to false. For details, see the property description.
    Connect as Enter the logon ID that is used to connect and mount the defined volume.

    If you create a volume for use with Distributed File System (DFS) services, enter the fully qualified domain name of the server and the user name for connecting and mounting the volume in the format FQDN\user.

     
    Password Enter the password that is used to connect and mount the defined volume.
    Volume Enter the name of the share to be mounted.

    If you create a volume for use with Distributed File System (DFS) services, enter the DFS namespace.

    Data from file or directory symbolic links in a share cannot be harvested.
    Initial directory Enter the name of the initial directory from which the harvest must begin.

    If you create a volume for use with Distributed File System (DFS) services, enter the folder target.

    With this feature, you can select a volume further down the directory tree rather than selecting an entire volume.

    If you create a volume for use with Distributed File System (DFS) services and want to include all namespace folders, do not specify an initial directory.

    Index options Select either or both of the Index options checkboxes.
    • Include system metadata for data objects within containers.
    • Include content tagging and full-text index.
    These options are selected by default.
    Validation To validate volume accessibility, select this option. When selected (the default state), IBM StoredIQ tests to see whether the volume can be accessed.

    If you define values for Start directory and End directory, IBM StoredIQ verifies that the objects specified there can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directories exist with names matching the strings you specified.

    Include directories Specify a regular expression for included directories for each harvest (if it was specified). These directories are defined as sets of "first node" directories, relative to the specified (or implied) starting directory that is considered part of the logical volume.
    Start directory Specify a starting point for the harvest. This involves volume partitioning to break up a large volume. If an initial directory is defined, what you specify as the start directory must be underneath the initial directory.

    You can specify a file name or a directory name. Harvest starts with the specified file or directory. All files and directories with names in the range determined by Start directory and End directory are harvested. This includes files at root level or, if set, in the initial directory.

    If validation of volume accessibility is enabled, IBM StoredIQ verifies that the objects specified as Start directory and End directory can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directory exists with a name matching the strings you specified.

    End directory Determine the end point for the harvest. You can specify a file name or a directory name. All files and directories with names in the range determined by Start directory and End directory are harvested. However, what you specify as the end directory is not part of volume partitioning. The harvest is stopped at this file or directory; it is not included in the harvest.

    If validation of volume accessibility is enabled, IBM StoredIQ verifies that the objects specified as Start directory and End directory can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directory exists with a name matching the strings you specified.

    Access Times Select one of these options:
    • Reset access times but do not synchronize them. (This setting is the default setting.)
    • Do not reset or synchronize access times.
    • Reset and synchronize access times on incremental harvests.
    Constraints Select one of these options:
    • Only use __ connection process (es): Specify a limit for the number of harvest connections to this volume. If the server is also being accessed for attribute and full-text searches, you might want to regulate the load on the server by limiting the harvester processes. The maximum number of harvest processes is automatically shown. This maximum number is set on the system configuration tab.
    • Control the number of parallel data object reads: Designate the number of parallel data object reads.
    • Scope harvests on these volumes by extension: Include or exclude data objects that are based on extension.
    Table 2. NFS v2 and v3 primary volumes
    Field Value Notes
    Server type Select NFS v2 or NFS v3.  
    Server Enter the fully qualified name of the server where the volume is available for mounting.  
    Volume Enter the name or names of the volume to be mounted.  
    Initial directory Enter the name of the initial directory from which the harvest must begin. With this feature, you can select a volume further down the directory tree rather than selecting an entire volume.
    Index options Select either or both of the Index options checkboxes.
    • Include system metadata for data objects within containers.
    • Include content tagging and full-text index.
    These options are selected by default.
    Validation To validate volume accessibility, select this option. When selected (the default state), IBM StoredIQ tests to see whether the volume can be accessed.

    If you define values for Start directory and End directory, IBM StoredIQ verifies that the objects specified there can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directories exist with names matching the strings you specified.

    Include directories Specify a regular expression for included directories for each harvest (if it was specified). These directories are defined as sets of "first node" directories, relative to the specified (or implied) starting directory, that is considered part of the logical volume.
    Start directory Specify a starting point for the harvest. This involves volume partitioning to break up a large volume. If an initial directory is defined, what you specify as the start directory must be underneath the initial directory.

    You can specify a file name or a directory name. Harvest starts with the specified file or directory. All files and directories with names in the range determined by Start directory and End directory are harvested. This includes files at root level or, if set, in the initial directory.

    If validation of volume accessibility is enabled, IBM StoredIQ verifies that the objects specified as Start directory and End directory can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directory exists with a name matching the strings you specified.

    End directory Determine the end point for the harvest. You can specify a file name or a directory name. All files and directories with names in the range determined by Start directory and End directory are harvested. However, what you specify as the end directory is not part of volume partitioning. The harvest is stopped at this file or directory; it is not included in the harvest.

    If validation of volume accessibility is enabled, IBM StoredIQ verifies that the objects specified as Start directory and End directory can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directory exists with a name matching the strings you specified.

    Access times Select one of these options:
    • Reset access times but do not synchronize them. (This setting is the default setting.)
    • Do not reset or synchronize access times.
    • Reset and synchronize access times on incremental harvests.
    Constraints Select one of these options:
    • Only use __ connection process (es): Specify a limit for the number of harvest connections to this volume. If the server is also being accessed for attribute and full-text searches, you might want to regulate the load on the server by limiting the harvester processes. The maximum number of harvest processes is automatically shown. This maximum number is set on the system configuration tab.
    • Control the number of parallel data object reads: Designate the number of parallel data object reads.
    • Scope harvests on these volumes by extension: Include or exclude data objects that are based on extension.
    Table 3. Exchange primary volumes
    Field Value Notes
    Server type Select Exchange.  
    Version In the Version list, select the appropriate version. Options include 2000/2003, 2007, 2010/2013/2016, and Online. Exchange Online volumes require some prerequisite configuration. For more information, see Registering IBM StoredIQ as a Microsoft service application for access to Exchange Online.
    Server Enter the fully qualified name of the server where the volume is available for mounting. Alternatively, you can enter a server alias. For Exchange primary volumes, it is the fully qualified domain name where the OWA is. Multiple Client Access servers on Exchange 2007 are supported. The server load must be balanced at the IP or DNS level.

    If you selected Online as the Version option, this field fills automatically with the Exchange Online server name.

    Mailbox server When you configure multiple client access servers, enter the name of one or more mailbox servers, which are separated by a comma. For Exchange primary volumes, it is the fully qualified domain name where the mailbox to be harvested is.

    If you selected Online as the Version option, this field is not available.

    Active Directory server Enter the name of the Active Directory server. It must be a fully qualified Active Directory server.

    If you selected Online as the Version option, this field is not available.

    Authentication URL This field is prepopulated with the appropriate URL for authenticating with Azure AD. Do not change this value. This field is available only if you selected Online as the Version option.
    Impersonation scope This field is prepopulated with the management scope for the impersonation. This scope defines the group of accounts for which impersonation is allowed. Do not change this value. This field is available only if you selected Online as the Version option.
    Impersonation account Enter the user account to use for connecting to Exchange Online. This account must be authorized to impersonate the members of the specified impersonation scope. This field is available only if you selected Online as the Version option.
    Client ID Enter the application (client) ID under which IBM StoredIQ is registered with Microsoft. This field is available only if you selected Online as the Version option.
    Client secret Enter the client secret that is associated with the client ID. The values make up the credentials for access to a Microsoft Exchange Online data source. This field is available only if you selected Online as the Version option.
    Protocol To use SSL, select the Protocol checkbox. If you selected Online as the Version option, the Use SSL checkbox is automatically selected, and this field cannot be edited.
    Connect as Enter the logon ID that is used to connect and mount the defined volume. If you selected Online as the Version option, this field is not available.
    Password Enter the password that is used to connect and mount the defined volume. If you selected Online as the Version option, this field is not available.
    Volume Enter the name or names of the volume to be mounted. For Exchange, enter a friendly name for the volume.
    Folder Select either of the Mailboxes or Public folders options.
    Initial directory Enter the name of the initial directory from which the harvest must begin. For Exchange, this field must be left blank if you are harvesting all mailboxes. If you are harvesting a single mailbox, enter the email address for that mailbox.
    Virtual root The name defaults to the correct endpoint for the selected Exchange version.  
    Personal archives Select Harvest personal archive to harvest personal archives. This checkbox is available only when Exchange 2010/2013/2016 or Online is selected.
    Index options Select either or both of the Index options checkboxes.
    • Include system metadata for data objects within containers.
    • Include content tagging and full-text index.
    These options are selected by default.
    Remove journal envelope   When selected, the journal envelope is removed.
    Validation To validate volume accessibility, select this option. When selected (the default state), IBM StoredIQ tests to see whether the volume can be accessed.
    Include directories Specify a regular expression for included directories for each harvest (if it was specified). These directories are defined as sets of "first node" directories, relative to the specified (or implied) starting directory that is considered part of the logical volume.
    Start directory Use Start directory and End directory to scope the harvest to a date range. Specify the start of the date range in the format YYYY-MM-DD. The date range is relative to the initial directory. For example, you can create an Exchange volume for the user John Doe's mailbox by setting that mailbox as initial directory and then limit the harvest to mails in this mailbox within the date range defined by the start and end directories.
    End directory Use Start directory and End directory to scope the harvest to a date range. Specify the end of the date range in the format YYYY-MM-DD. Harvesting stops at the first email with this date stamp.
    Constraints Select one of these options:
    • Only use __ connection process (es): Specify a limit for the number of harvest connections to this volume. If the server is also being accessed for attribute and full-text searches, you might want to regulate the load on the server by limiting the harvester processes. The maximum number of harvest processes is automatically shown. This maximum number is set on the system configuration tab.
    • Control the number of parallel data object reads: Designate the number of parallel data object reads.
    Table 4. SharePoint primary volumes.
    Prerequisites: For SharePoint volume prerequisites and configuration information, see Configuration of SharePoint and Special note about SharePoint volumes.
    Field Value Notes
    Server type SharePoint Required.
    Version Select one of these servers: 2003, 2007, 2010, 2013, 2016, or Online. Required.
    Server The fully qualified name of the SharePoint server. Required. When you add SharePoint volumes that contain spaces in the URL, see Special Note: Adding SharePoint Volumes.
    Active Directory server The name of the Active Directory server. Optional. Specify the fully qualified Active Directory server name. This option is not available for SharePoint Online.
    Protocol Select Use SSL only if SSL is enabled for this SharePoint server. For SharePoint Online, this option is automatically selected and cannot be edited. Optional.

    If SSL is enabled on the SharePoint server and you do not select this option, no volume is created and the HTTP status code 301 Moved Permanently is returned. To fix the issue, select the option.

    If SSL is not enabled on the SharePoint server and you select this option, no volume is created and the socket error [Errno 111] Connection Refused is returned. To fix this issue, clear the Use SSL checkbox.

    Connect as Enter the name of a user with the required permissions for that site collections. Use the following syntax:
    • SharePoint Online:
      userid@Microsoft_cloudname.com
    • Other SharePoint versions:
      Active Directory Domain Name\username
    Required. Use a site collection administrator account.
    No volume can be added if the validation of the credentials fails, which can happen for the following reasons:
    • The user does not exist or does not have the required permissions.
    • The password is not correct.
    The HTTP status code is usually 401 Unauthorized. However, for SharePoint Online, the HTTP status code 400 Bad Request is returned for insufficient permissions.
    Password Enter the password for the user specified in Connect as. Required.
    Volume Enter the URL of the SharePoint site collection, for example: /portal/site Required. Do not include the SharePoint server name in the URL, otherwise the URL cannot be located on the server and thus no volume is created.

    When you add SharePoint volumes that contain spaces in the URL, see Special Note: Adding SharePoint Volumes.

    Initial directory Enter the name of the subsite from which you want the harvest to start. Optional.
    • With this feature, you can select a volume further down the directory tree rather than selecting an entire volume.
    • When you add SharePoint volumes that contain spaces in the URL, see Special Note: Adding SharePoint Volumes.
    Index options Select either or both of the Index options checkboxes.
    • Include system metadata for data objects within containers
    • Include content tagging and full-text index
    Both options are selected by default.
    Tip: Leave the Include metadata for contained objects checkbox selected to have metadata for objects in containers added to the metadata index. To avoid creating a full-text index for the entire volume, clear the Include content tagging and full-text index checkbox. Create a full-text index for a subset of data later by running a Step-up Analytics action.

    For SharePoint Online, full-text indexing of OneNote notebook objects, that is, Notes®, is not supported currently. FSMD-based searches for these files are supported.

    Subsites To check all sites and subsites of the site collection for data objects, select Recurse into subsites. Optional.
    Versions To harvest all document versions, select Include all versions. Optional. IBM StoredIQ supports indexing versions from SharePoint. For more information, see Special Note: Adding SharePoint Volumes.
    Validation To validate volume accessibility, select this option. When selected (the default state), IBM StoredIQ tests to see whether the volume can be accessed.

    If you define values for Start directory and End directory, IBM StoredIQ verifies that the objects specified there can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directories exist with names matching the strings you specified.

    Include directories Specify a regular expression for included directories for each harvest (if it was specified). These directories are defined as sets of "first node" directories, relative to the specified (or implied) starting directory, that is considered part of the logical volume.
    Start directory Specify a starting point for the harvest. This involves volume partitioning to break up a large volume. If an initial directory is defined, what you specify as the start directory must be underneath the initial directory.

    You can specify a file name or a directory name. Harvest starts with the specified file or directory. All files and directories with names in the range determined by Start directory and End directory are harvested. This includes files at root level or, if set, in the initial directory.

    If validation of volume accessibility is enabled, IBM StoredIQ verifies that the objects specified as Start directory and End directory can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directory exists with a name matching the strings you specified.

    End directory Determine the end point for the harvest. You can specify a file name or a directory name. All files and directories with names in the range determined by Start directory and End directory are harvested. However, what you specify as the end directory is not part of volume partitioning. The harvest is stopped at this file or directory; it is not included in the harvest.

    If validation of volume accessibility is enabled, IBM StoredIQ verifies that the objects specified as Start directory and End directory can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directory exists with a name matching the strings you specified.

    Constraints Select one of these options:
    • Only use __ connection process (es): Specify a limit for the number of harvest connections to this volume. If the server is also being accessed for attribute and full-text searches, you might want to regulate the load on the server by limiting the harvester processes. The maximum number of harvest processes is automatically shown. This maximum number is set on the system configuration tab.
    • Control the number of parallel data object reads: Designate the number of parallel data object reads.
    Table 5. Documentum primary volumes.
    Prerequisites: Before you can add Documentum volumes, you must add the Documentum server. For more information, see Adding a Documentum server as a data source.
    Field Value Notes
    Server type Select Documentum. For Documentum, you must specify the doc broker. See Configuring Documentum.
    Doc base Enter the name of the Documentum repository. A Documentum repository contains cabinets, and cabinets contain folders and documents.
    Connect as Enter the logon ID that is used to connect and mount the defined volume.  
    Password Enter the password that is used to connect and mount the defined volume.
    Volume Enter the name or names of the volume to be mounted. For Documentum, enter a friendly name for the volume.
    Harvest To enable harvesting all document versions, select Harvest all document versions.
    Important: If you do not select this option for the initial harvest, changing the setting later does not have an effect when the volume is reharvested. As a workaround, create a new volume and ensure that the Harvest all document versions option is set before you start harvesting.
    Initial directory Enter the name of the initial directory from which the harvest must begin. With this feature, you can select a volume further down the directory tree rather than selecting an entire volume.
    Index options Select either or both of the Index options checkboxes.
    • Include system metadata for data objects within containers.
    • Include content tagging and full-text index.
    These options are selected by default.
    Validation To validate volume accessibility, select this option. When selected (the default state), IBM StoredIQ tests to see whether the volume can be accessed.
    Include directories Specify a regular expression for included directories for each harvest (if it was specified). These directories are defined as sets of "first node" directories, relative to the specified (or implied) starting directory that is considered part of the logical volume.
    Constraints Select one of these options:
    • Only use __ connection process (es): Specify a limit for the number of harvest connections to this volume. If the server is also being accessed for attribute and full-text searches, you might want to regulate the load on the server by limiting the harvester processes. The maximum number of harvest processes is automatically shown. This maximum number is set on the system configuration tab.
    • Control the number of parallel data object reads: Designate the number of parallel data object reads.
    Table 6. Domino primary volumes
    Field Value Notes
    Server type Select Domino. For Domino, you must first upload at least one user.id. See Adding Domino as a Primary Volume.
    Server Enter the fully qualified name of the server where the volume is available for mounting.
    It can happen that the data server cannot find the Domino server based on this information because the DNS resolution fails. In this case, ping the Domino server. If the ping is successful, add an entry to the data server's /etc/hosts file. The entry must consist of the common name portion of the Domino server name (omit the domain portion) and the IP address associated with the Domino server, for example:
    198.51.100.0 NALLN999
    For Domino, select the appropriate user name, which was entered with the Configuration subtab in the Lotus Notes® user administration area.
    Connect as Enter the logon ID that is used to connect and mount the defined volume. For Domino, select the user name for the primary user ID. The user ID must be configured on the System Configuration screen under the Lotus Notes user administration link.
    Password Enter the password that is used to connect and mount the defined volume. For Domino, enter the password for the primary user ID.
    Volume Enter the name or names of the volume to be mounted. For Domino, enter a friendly name for the volume.
    Harvest
    • To harvest mailboxes, select the Harvest mailboxes option.
    • To harvest mail journals, select the Harvest mail journals option.
    • To harvest all applications, select the Harvest all applications option.
    This option obtains the list of all known Domino users and their NSFs. It then harvests those mailboxes unless it was pointed to a single mailbox with the initial directory.
    Initial directory Enter the name of the initial directory.  
    Index options Select either or both of the Index options checkboxes.
    • Include system metadata for data objects within containers.
    • Include content tagging and full-text index.
    These options are selected by default.
    Validation To validate volume accessibility, select this option. When selected (the default state), IBM StoredIQ tests to see whether the volume can be accessed.

    If you define values for Start directory and End directory, IBM StoredIQ verifies that the objects specified there can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directories exist with names matching the strings you specified.

    Include directories

    Specify a regular expression for included directories for each harvest (if it was specified). These directories are defined as sets of "first node" directories, relative to the specified (or implied) starting directory that is considered part of the logical volume.
    Start directory Specify a starting point for the harvest. This involves volume partitioning to break up a large volume. If an initial directory is defined, what you specify as the start directory must be underneath the initial directory.

    You can specify a file name or a directory name. Harvest starts with the specified file or directory. All files and directories with names in the range determined by Start directory and End directory are harvested. This includes files at root level or, if set, in the initial directory.

    If validation of volume accessibility is enabled, IBM StoredIQ verifies that the objects specified as Start directory and End directory can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directory exists with a name matching the strings you specified.

    End directory Determine the end point for the harvest. You can specify a file name or a directory name. All files and directories with names in the range determined by Start directory and End directory are harvested. However, what you specify as the end directory is not part of volume partitioning. The harvest is stopped at this file or directory; it is not included in the harvest.

    If validation of volume accessibility is enabled, IBM StoredIQ verifies that the objects specified as Start directory and End directory can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directory exists with a name matching the strings you specified.

    Constraints Select one of these options:
    • Only use __ connection process (es): Specify a limit for the number of harvest connections to this volume. If the server is also being accessed for attribute and full-text searches, you might want to regulate the load on the server by limiting the harvester processes. The maximum number of harvest processes is automatically shown. This maximum number is set on the system configuration tab.
    • Control the number of parallel data object reads: Designate the number of parallel data object reads.
    Table 7. FileNet primary volumes
    Field Value Notes
    Server type Select FileNet. Within IBM StoredIQ Data Server, the FileNet® domain must be configured before any FileNet volumes are created. See Configuring FileNet.
    FileNet config Select the FileNet server you would like to use for this configuration. For more information, see Configuring FileNet.
    Connect as Enter the logon ID that is used to connect and mount the defined volume.  
    Password Enter the password that is used to connect and mount the defined volume.
    Domain Domain name automatically populates.  
    Object store Select an object store. The object store must exist before you create a FileNet primary volume.
    Volume Enter the name or names of the volume to be mounted.  
    Index options Select either or both of the Index options checkboxes.
    • Include system metadata for data objects within containers.
    • Include content tagging and full-text index.
    These options are selected by default.
    Constraints Select one of these options:
    • Only use __ connection process (es): Specify a limit for the number of harvest connections to this volume. If the server is also being accessed for attribute and full-text searches, you might want to regulate the load on the server by limiting the harvester processes. The maximum number of harvest processes is automatically shown. This maximum number is set on the system configuration tab.
    • Control the number of parallel data object reads: Designate the number of parallel data object reads.
    Table 8. NewsGator primary volumes
    Field Value Notes
    Server type Select NewsGator.  
    Server Enter the fully qualified name of the server where the volume is available for mounting.  
    Protocol To use SSL, select the Protocol checkbox.
    Connect as Enter the logon ID that is used to connect and mount the defined volume.  
    Password Enter the password that is used to connect and mount the defined volume.  
    Volume Enter the name or names of the volume to be mounted. Enter a friendly name for the volume.
    Index options Select either or both of the Index options checkboxes.
    • Include system metadata for data objects within containers.
    • Include content tagging and full-text index.
    These options are selected by default.
    Validation To validate volume accessibility, select this option. When selected (the default state), IBM StoredIQ tests to see whether the volume can be accessed.
    Table 10. Jive primary volumes
    Field Value Notes
    Server type Select Jive.  
    Server Enter the fully qualified name of the server where the volume is available for mounting.  
    Protocol To use SSL, select the Protocol checkbox.
    Connect as Enter the logon ID that is used to connect and mount the defined volume.  
    Password Enter the password that is used to connect and mount the defined volume.  
    Volume Enter the name or names of the volume to be mounted.  
    Initial directory Enter the name of the initial directory from which the harvest must begin. With this feature, you can select a volume further down the directory tree rather than selecting an entire volume.
    Index options Select either or both of the Index options checkboxes.
    • Include system metadata for data objects within containers.
    • Include content tagging and full-text index.
    These options are selected by default.
    Versions To harvest all document versions, select Include all versions.  
    Validation To validate volume accessibility, select this option. When selected (the default state), IBM StoredIQ tests to see whether the volume can be accessed.

    If you define values for Start directory and End directory, IBM StoredIQ verifies that the objects specified there can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directories exist with names matching the strings you specified.

    Include directories Specify a regular expression for included directories for each harvest (if it was specified). These directories are defined as sets of "first node" directories, relative to the specified (or implied) starting directory that is considered part of the logical volume.
    Start directory Specify a starting point for the harvest. This involves volume partitioning to break up a large volume. If an initial directory is defined, what you specify as the start directory must be underneath the initial directory.

    You can specify a file name or a directory name. Harvest starts with the specified file or directory. All files and directories with names in the range determined by Start directory and End directory are harvested. This includes files at root level or, if set, in the initial directory.

    If validation of volume accessibility is enabled, IBM StoredIQ verifies that the objects specified as Start directory and End directory can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directory exists with a name matching the strings you specified.

    End directory Determine the end point for the harvest. You can specify a file name or a directory name. All files and directories with names in the range determined by Start directory and End directory are harvested. However, what you specify as the end directory is not part of volume partitioning. The harvest is stopped at this file or directory; it is not included in the harvest.

    If validation of volume accessibility is enabled, IBM StoredIQ verifies that the objects specified as Start directory and End directory can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directory exists with a name matching the strings you specified.

    Constraints Select one of these options:
    • Only use __ connection process (es): Specify a limit for the number of harvest connections to this volume. If the server is also being accessed for attribute and full-text searches, you might want to regulate the load on the server by limiting the harvester processes. The maximum number of harvest processes is automatically shown. This maximum number is set on the system configuration tab.
    • Control the number of parallel data object reads: Designate the number of parallel data object reads.
    Table 11. Chatter primary volumes
    Field Value Notes
    Server type Select Chatter. For Chatter, see Configuring Chatter messages.
    Connect as Enter the logon ID that is used to connect and mount the defined volume.  
    Password Enter the password that is used to connect and mount the defined volume.
    Auth token Enter the token that is used to authenticate the Chatter volume. The auth token must match the user name that is used in the Connect as field. Auth tokens can be generated online on Salesforce. See Configuring chatter messages.
    Volume Enter the name or names of the volume to be mounted.  
    Initial directory Enter the name of the initial directory from which the harvest must begin. With this feature, you can select a volume further down the directory tree rather than selecting an entire volume.
    Index options Select either or both of the Index options checkboxes.
    • Include system metadata for data objects within containers.
    • Include content tagging and full-text index.
    These options are selected by default.
    Validation To validate volume accessibility, select this option. When selected (the default state), IBM StoredIQ tests to see whether the volume can be accessed.

    If you define values for Start directory and End directory, IBM StoredIQ verifies that the objects specified there can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directories exist with names matching the strings you specified.

    Include directories Specify a regular expression for included directories for each harvest (if it was specified). These directories are defined as sets of "first node" directories, relative to the specified (or implied) starting directory, that is considered part of the logical volume.
    Start directory Specify a starting point for the harvest. This involves volume partitioning to break up a large volume. If an initial directory is defined, what you specify as the start directory must be underneath the initial directory.

    You can specify a file name or a directory name. Harvest starts with the specified file or directory. All files and directories with names in the range determined by Start directory and End directory are harvested. This includes files at root level or, if set, in the initial directory.

    If validation of volume accessibility is enabled, IBM StoredIQ verifies that the objects specified as Start directory and End directory can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directory exists with a name matching the strings you specified.

    End directory Determine the end point for the harvest. You can specify a file name or a directory name. All files and directories with names in the range determined by Start directory and End directory are harvested. However, what you specify as the end directory is not part of volume partitioning. The harvest is stopped at this file or directory; it is not included in the harvest.

    If validation of volume accessibility is enabled, IBM StoredIQ verifies that the objects specified as Start directory and End directory can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directory exists with a name matching the strings you specified.

    Constraints Select one of these options:
    • Only use __ connection process (es): Specify a limit for the number of harvest connections to this volume. If the server is also being accessed for attribute and full-text searches, you might want to regulate the load on the server by limiting the harvester processes. The maximum number of harvest processes is automatically shown. This maximum number is set on the system configuration tab.
    • Control the number of parallel data object reads: Designate the number of parallel data object reads.
    Table 12. IBM Content Manager primary volumes
    Field Value Notes
    Server type Select IBM Content Manager.  
    Server Enter the fully qualified host name of the library server database.  
    Port Enter the port that is used to access the library server database.  
    Repository Enter the name of the library server database.  
    Database type Select the type of database that is associated with the volume. Options include DB2 and Oracle. By default, DB2 is selected.  
    Schema Enter the schema of the library server database.  
    Remote database Enter the name of the remote database. Optional.
    Connection String Enter any additional parameters. Optional.
    Harvest itemtype Enter the name of the item types to be harvested, separated by commas. Required.
    Copy to itemtype Only SiqDocument is supported. For more information, see IBM Content Manager attributes.
    Connect as Enter the logon ID that is used to connect and mount the defined volume.
    Note: This Content Manager user ID must have access to all documents to be able to create documents in the SiqDocument item type for copy to. If the SiqDocument item type does not exist, a Content Manager administration ID must be used, as this ID creates the item type when the volume is created.
     
    Password Enter the password that is used to connect and mount the defined volume.
    Volume Enter the name or names of the volume to be mounted.  
    Index options Select either or both of the Index options checkboxes.
    • Include system metadata for data objects within containers.
    • Include content tagging and full-text index.
    Both options are selected by default.
    Validation To validate volume accessibility, select Validation. When selected (the default state), IBM StoredIQ tests to see whether the volume can be accessed.
    Constraints Select one of these options:
    • Only use __ connection process (es): Specify a limit for the number of harvest connections to this volume. If the server is also being accessed for attribute and full-text searches, you might want to regulate the load on the server by limiting the harvester processes. The maximum number of harvest processes is automatically shown. This maximum number is set on the system configuration tab.
    • Control the number of parallel data object reads: Designate the number of parallel data object reads.
    • Scope harvests on these volumes by extension: Designate which data objects must be harvested by entering those object extensions.
     
    Table 13. CMIS primary volumes
    Field Value Notes
    Server type Select CMIS.  
    Server Enter the fully qualified name of the server where the volume is available for mounting.  
    Port Enter the name of the port.  
    Repository Enter the name of the repository.  
    Service In the Service text box, enter the name of the service.  
    Protocol To use SSL, select the Protocol checkbox.  
    Connect as Enter the logon ID that is used to connect and mount the defined volume.  
    Password Enter the password that is used to connect and mount the defined volume.  
    Volume Enter the name or names of the volume to be mounted.  
    Index options Select either or both of the Index options checkboxes.
    • Include system metadata for data objects within containers.
    • Include content tagging and full-text index.
    These options are selected by default.
    Validation To validate volume accessibility, select this option. When selected (the default state), IBM StoredIQ tests to see whether the volume can be accessed.

    Constraints

    Select one of these options:
    • Only use __ connection process (es): Specify a limit for the number of harvest connections to this volume. If the server is also being accessed for attribute and full-text searches, you might want to regulate the load on the server by limiting the harvester processes. The maximum number of harvest processes is automatically shown. This maximum number is set on the system configuration tab.
    • Control the number of parallel data object reads: Designate the number of parallel data object reads.
    Table 14. HDFS primary volumes
    Field Value Notes
    Server type Select HDFS. Required.
    Server Enter the host name or IP address. Required.
    Port Enter the port number.  
    Repository Enter the name of the repository.  
    Option string This option is supported: VerifyCertificate=True. This option is used to indicate that the validity of the HDFS server's SSL certificate is verified when SSL is used. Values are True, False, or default value. If no value is specified, value is False. To validate the certificate on the HDFS server, the user needs to specify this option and set the value to True.
    Protocol To use SSL, select the Protocol checkbox.  
    Connect as Enter the logon ID that is used to connect and mount the defined volume.  
    Password Enter the password that is used to connect and mount the defined volume. Authentication to HDFS is not supported. If your HDFS server requires a password, StoredIQ is not able to connect to it.
    Volume Enter the name or names of the volume to be mounted.  
    Initial directory Enter the name of the initial directory from which the harvest must begin. To avoid the interrogator timeout issue, see Note at the end of this table.
    Index options Select either or both of the Index options checkboxes.
    • Include system metadata for data objects within containers.
    • Include content tagging and full-text index.
    These options are selected by default.
    Validation To validate volume accessibility, select this option.

    If you define values for Start directory and End directory, IBM StoredIQ verifies that the objects specified there can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directories exist with names matching the strings you specified.

    Include directories    
    Start directory Specify a starting point for the harvest. This involves volume partitioning to break up a large volume. If an initial directory is defined, what you specify as the start directory must be underneath the initial directory.

    You can specify a file name or a directory name. Harvest starts with the specified file or directory. All files and directories with names in the range determined by Start directory and End directory are harvested. This includes files at root level or, if set, in the initial directory.

    If validation of volume accessibility is enabled, IBM StoredIQ verifies that the objects specified as Start directory and End directory can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directory exists with a name matching the strings you specified.

     
    End directory Determine the end point for the harvest. You can specify a file name or a directory name. All files and directories with names in the range determined by Start directory and End directory are harvested. However, what you specify as the end directory is not part of volume partitioning. The harvest is stopped at this file or directory; it is not included in the harvest.

    If validation of volume accessibility is enabled, IBM StoredIQ verifies that the objects specified as Start directory and End directory can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directory exists with a name matching the strings you specified.

     
    Constraints Select one of these options:
    • Only use __ connection process (es): Specify a limit for the number of harvest connections to this volume. If the server is also being accessed for attribute and full-text searches, you might want to regulate the load on the server by limiting the harvester processes. The maximum number of harvest processes is automatically shown. This maximum number is set on the system configuration tab.
    • Control the number of parallel data object reads: Designate the number of parallel data object reads.
    • Scope harvests on these volumes by extension: Designate which data objects must be harvested by entering those object extensions.
     
    Note: If you harvest HDFS volumes with many files in a directory, then an interrogator timeout might occur resulting in a Skipped directory exception in the harvest audit. HDFS responds to StoredIQ slowly when it handles large directories and processes more responses from HDFS. The slow response from HDFS is caused by high CPU usage on HDFS NameNode. Therefore, if interrogator timeout occurs and high CPU usage on the HDFS server is observed, you can allocate more CPU resources to the HDFS server.

    To avoid the interrogator timeout issue, you can also limit the file number to 250,000 files in a directory. Since each directory has its own timeout, having fewer files in a single directory ensures efficient operation. Splitting large directories into many small ones also helps resolve the interrogator timeout issues. For example, 1,000,000 files that are equally distributed into 10 directories have fewer risks of timeouts than if they are in one directory.

    Table 15. Connections primary volumes
    Field Value Notes
    Server Type Select Connections in the Server type list. Required
    Server Enter the fully qualified domain name of the server from which the volume is available for mounting. Required
    Class name Enter
    deepfile.fs.template.
    impl.ibmconnections.
    ibmconnectionsconn.
    IBMConnections
    Required
    Repository name Enter
    deepfile.fs.template.
    impl.ibmconnections.
    ibmconnectionsconn
    Required
    Option string  
    Connect as Enter the user name of the account that is set up with admin and search-admin privileges on the Connections server. Required
    Password Enter the password of the account that is set up with admin and search-admin privileges on the Connections server. Required
    Volume Enter any name. Required
    Initial Directory  
    Index options Select either or both of the Index options checkboxes.
    • Include system metadata for data objects within containers.
    • Include content tagging and full-text index.
    These options are selected by default.
    Volume Enter the name or names of the volume to be mounted.  
    Validation To validate volume accessibility, select this option. When selected (the default state), IBM StoredIQ tests to see whether the volume can be accessed.

    If you define values for Start directory and End directory, IBM StoredIQ verifies that the objects specified there can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directories exist with names matching the strings you specified.

    Include directories Specify a regular expression for included directories for each harvest (if it was specified). These directories are defined as sets of "first node" directories, relative to the specified (or implied) starting directory that is considered part of the logical volume.
    Start directory Specify a starting point for the harvest. This involves volume partitioning to break up a large volume. If an initial directory is defined, what you specify as the start directory must be underneath the initial directory.

    You can specify a file name or a directory name. Harvest starts with the specified file or directory. All files and directories with names in the range determined by Start directory and End directory are harvested. This includes files at root level or, if set, in the initial directory.

    If validation of volume accessibility is enabled, IBM StoredIQ verifies that the objects specified as Start directory and End directory can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directory exists with a name matching the strings you specified.

    End directory Determine the end point for the harvest. You can specify a file name or a directory name. All files and directories with names in the range determined by Start directory and End directory are harvested. However, what you specify as the end directory is not part of volume partitioning. The harvest is stopped at this file or directory; it is not included in the harvest.

    If validation of volume accessibility is enabled, IBM StoredIQ verifies that the objects specified as Start directory and End directory can be accessed. However, the validation is limited to the directory level. Therefore, make sure to clear the Validate volume accessibility checkbox if no directory exists with a name matching the strings you specified.

    Access Times Select one of these options:
    • Reset access times but do not synchronize them. (This setting is the default setting.)
    • Do not reset or synchronize access times.
    • Reset and synchronize access times on incremental harvests.
    Constraints Select one of these options:
    • Only use __ connection process (es): Specify a limit for the number of harvest connections to this volume. If the server is also being accessed for attribute and full-text searches, you might want to regulate the load on the server by limiting the harvester processes. The maximum number of harvest processes is automatically shown. This maximum number is set on the system configuration tab.
    • Control the number of parallel data object reads: Designate the number of parallel data object reads.
    • Scope harvests on these volumes by extension: Include or exclude data objects that are based on extension.