Automated scanning of an IBM Storage Scale fileset

As an administrator, you can initiate the IBM Storage Scale scan from IBM Spectrum® Discover to collect system metadata from the IBM Storage Scale file set or file sets.

Before you begin

This feature adds a requirement for non-root user IDs that are used for scanning IBM Storage Scale data source systems. This feature uses the mmlsfileset command to retrieve the list of available file sets from the target system when you have root-level permissions. So, if you use a non-root user ID it must have sudo access to mmlsfileset for this function to work.

There is already a requirement for a non-root scan user to have sudo access to mmapplypolicy, so this requirement adds mmlsfileset as an extra required command.
Note: You cannot query the available file sets on a target IBM Storage Scale connection or initiate a file set level scan unless you fulfill this requirement.

About this task

Scan the IBM Storage Scale file set or file sets to insert or update the records for the files that are found by IBM Spectrum Discover in that file set or file sets. The scan is scoped to the specified file set, which ensures a faster total scan than scanning the entire file system. Multiple file sets can be specified in a single scan operation, but the scanning of each file set is done successively.

As the scan progresses, the status message is updated to indicate the following information:
  • The status message indicates which file set is being scanned.
  • The status message indicates when data operations (such as transferring files or indexing data) occur.
This status message can be seen in the GUI on the data source connections table or it can be queried by using the REST API.

This feature works irrespective of whether the data is returned to IBM Spectrum Discover by using a direct Kafka connection or by using the file copy method. After a file set level scan completes, a scan generation is recorded or committed.

Additionally, an internal reclamation policy is generated to remove any deleted files that did not appear in the updated scan. The scope of this reclamation policy is limited to the file set that is scanned and does not affect other file sets or the actual file system. This limitation helps you achieve consistency with the source IBM Storage Scale system at file set level granularity.

Procedure

  1. Go to the IBM Spectrum Discover GUI.
  2. Click menu and go to Data connections > Connections table.
    Select the data source connection name and click Scan now, which opens the Select scan type dialog box.
    You can select whether to scan the entire file system or to scan a list of file sets.
    Important: Connection types other than IBM Storage Scale and SMB/CIFS do not open this dialog box. Additionally, Scan now continues to function as it has, which means that there is an immediate initiation of a full connection scan.
  3. Select either Scan all to scan all file sets or Select Filesets to scan a specific file set.

    Selecting Scan all initiates a full scan of the file system. If you choose to scan all file sets, click Scan to run the scan.

    Selecting Select Filesets initiates a specific file scan. Click Next to open the Select Individual Filesets dialog box. Use this dialog box to select the specific file sets that you want to scan. Search the table by using the table search header:
    1. You can select file sets by clicking the row of the table that represents that file set. Clicking the row highlights that row and the count under View X selected filesets increases by 1.
    2. You can also select file set by filtering the search criteria. The table can be filtered to show only the selected file sets by clicking View X selected filesets, for ease of review. For example, you can enter fs to display all file set with those characters in that order. Click the file set in the table row that you want to select to run the scan on that file set.

    To go back to viewing all available file sets, click View X selected filesets again. The button changes to View all filesets when you view only the selected file sets.

  4. After you select all wanted file sets, you can initiate the scan by clicking Scan. Clicking Scan takes you to the Connections table.
    A notification indicates when the scan starts (or that the scan fails if there is a problem). You can view the status of the scan on the table in the Scan Status column for the target connection.
    Remember: After the Scan Status has a check mark next to it, the scan is complete for all selected file sets.