Creating an IBM Storage Scale data source connection

You can use the IBM Spectrum® Discover graphical user interface to create data connections from the source storage systems.

Procedure

  1. Log in to the IBM Spectrum Discover web interface with a user ID that has the Data Admin role that is associated with it.

    The data admin access role is required for creating connections. For more information, see Managing user access. For more information, see Managing User Access in the Data Cataloging: Administration Guide.

  2. Click menu and go to Data connections > Connections to display the different types of data source connection names, connection type, clusters, data source, site, state, scan status, next scan, and Add Connection button.
  3. Click Add connection to display a new window that shows Add data source connection.
    You can enter in the connection name and connection type. The connection types are:
    • IBM Storage Scale
    • IBM Cloud® Object Storage
    • Network File System (NFS)
  4. Complete the following steps:
    1. In the field for Connection name, define a Connection name.
    2. Choose type of connection from Connection type drop-down list.
  5. Set the connection type to IBM Storage Scale. The page displays the connection name, user, password, working directory, and scan directory information that you can enter. You can also schedule a data scan, select a collection, or enable live events.

    If you click Enable live events, you can enable the IBM Storage Scale watch folder on the specified file system.

  6. Complete updating values for all the fields to add the IBM Storage Scale connection type, and click Submit Connection.

    For IBM Storage Scale connections, you can enter the following information:
    Connection name
    The name of the connection, an identifier for the user. For example, filesystem1.
    Note: It must be a unique name within IBM Spectrum Discover.
    User
    A user ID that has permissions to connect to the data source system and initiate a scan.
    Password
    The password for the user ID specified in user.
    Authentication Type
    The password authentication can be done by using the password that is provided to authenticate with the IBM Storage Scale cluster. The shared RSA key authentication performs a password less authentication by using a private key that is provided by the system administrator and whose public key exists in the authorized keys for the specified user on the Scale host.
    Note: IBM Spectrum Discover 2.0.3.1 removes the support for self-generated RSA key pair for IBM Spectrum Discover. Any existing connections that use that method is updated to password based authentication and the self-generated key pair is removed during the upgrade to 2.0.3.1 or later. If the password for the scan user that is stored in IBM Spectrum Discover is no longer valid, that can result in scan failures after the update. To rectify this, you must edit the connection and provide a valid password for the scan user or a valid RSA private key for authentication.
    Working Directory
    A scratch directory on the source data system where IBM Spectrum Discover can put its temporary files.
    Note: When you edit an existing connection and change the User from a root user to a non-root user, you must also change the Working Directory. This change is necessary because the non-root User cannot access the files that are previously created by the root user in the existing Working Directory.
    Scan Directory
    The root directory of the scan. All files and directories under this directory are scanned. Typically, this directory is the base directory of the file system. For example, /gpfs/fs1.
    Connection Type
    The type of source storage system this connection represents.
    Site
    An optional physical location tag that an administrator can provide to see the physical distribution of their data.
    Cluster
    The IBM Storage Scale or GPFS cluster name. To obtain, run the following command from the IBM Storage Scale file system: /usr/lpp/mmfs/bin/mmlscluster.
    Host
    The hostname or IP address of an IBM Storage Scale node from which a scan can be initiated, for example a quorum-manager node.
    File system
    The short name (omit /dev/) of the file system to be scanned. For example, fs1.
    Note: It is important to exactly match the file system name (data source) that IBM Storage Scale populates in the scan file. Run the following command on the IBM Storage Scale system: /usr/lpp/mmfs/bin/mmlsmount all
    Node list
    The comma-delimited list of nodes or node classes that participates in the scan of an IBM Storage Scale file system. For example, scale01,scale02.
    Node  Daemon node name  IP address    Admin node name  Designation
    --------------------------------------------------------------------
       1   msys111-10g       172.16.8.111  msys111-dmz      quorum-manager-perfmon
       2   msys112-10g       172.16.8.112  msys112-dmz      quorum-manager-perfmon
       3   msys113-10g       172.16.8.113  msys113-dmz      quorum-manager-perfmon
    Note: When you create data source connections for IBM Storage Scale file systems, it is important to exactly match the cluster name and the file system name (data source) that IBM Storage Scale populates in the scan file.
    Run the following commands on the IBM Storage Scale system.
    1. Run the following command to display information about the GPFS cluster:
      $ /usr/lpp/mmfs/bin/mmlscluster
      
      GPFS cluster information
      ========================
       GPFS cluster name:         modevvm19.tuc.example.com,
       GPFS cluster id:           7146749509622277333
       GPFS UID domain:           modevvm19.tuc.example.com
       Remote shell command:      /usr/bin/ssh
       Remote file copy command:  /usr/bin/scp
       Repository type:           CCR
      Node  Daemon node name           IP address    Admin node name            Designation
      ----------------------------------------------------------------------------------------
      1     modevvm19.tuc.example.com  203.0.113.24  modevvm19.tuc.example.com  quorum-manager
    2. Run the following command to display information about file systems that are mounted: on the nodes:
      $ /usr/lpp/mmfs/bin/mmlsmount all
      File system gpfs0 is mounted on 1 nodes
      File system Data_Science_8M is mounted on 7 nodes.
      File system icp4D_data_fs_master1 is mounted on 8 nodes.
      File system icp4D_data_fs_master2 is mounted on 8 nodes.
      File system icp4D_data_fs_master3 is mounted on 8 nodes.