Creating an IBM Storage Scale data source connection
You can use the IBM Spectrum® Discover graphical user interface to create data connections from the source storage systems.
Procedure
-
Log in to the IBM Spectrum
Discover web interface with a
user ID that has the Data Admin role that is associated with it.
The data admin access role is required for creating connections. For more information, see Managing user access. For more information, see Managing User Access in the Data Cataloging: Administration Guide.
- Click menu and go to Data connections > Connections to display the different types of data source connection names, connection type, clusters, data source, site, state, scan status, next scan, and Add Connection button.
- Click Add connection to display a new window that shows
Add data source connection. You can enter in the connection name and connection type. The connection types are:
- IBM Storage Scale
- IBM Cloud® Object Storage
- Network File System (NFS)
- Complete the following steps:
- In the field for Connection name, define a Connection name.
- Choose type of connection from Connection type drop-down list.
-
Set the connection type to IBM Storage
Scale. The
page displays the connection name, user, password, working directory, and scan directory information
that you can enter. You can also schedule a data scan, select a collection, or enable live
events.
If you click Enable live events, you can enable the IBM Storage Scale watch folder on the specified file system.
-
Complete updating values for all the fields to add the IBM Storage
Scale connection type, and click Submit
Connection.
For IBM Storage Scale connections, you can enter the following information:
- Connection name
- The name of the connection, an identifier for the user. For example,
filesystem1.Note: It must be a unique name within IBM Spectrum Discover.
- User
- A user ID that has permissions to connect to the data source system and initiate a scan.
- Password
- The password for the user ID specified in user.
- Authentication Type
- The password authentication can be done by using the password that is provided to authenticate with the IBM Storage Scale cluster. The shared RSA key authentication performs a password less authentication by using a private key that is provided by the system administrator and whose public key exists in the authorized keys for the specified user on the Scale host.
Note: IBM Spectrum Discover 2.0.3.1 removes the support for self-generated RSA key pair for IBM Spectrum Discover. Any existing connections that use that method is updated to password based authentication and the self-generated key pair is removed during the upgrade to 2.0.3.1 or later. If the password for the scan user that is stored in IBM Spectrum Discover is no longer valid, that can result in scan failures after the update. To rectify this, you must edit the connection and provide a valid password for the scan user or a valid RSA private key for authentication.- Working Directory
- A scratch directory on the source data system where IBM Spectrum
Discover can put its temporary files.Note: When you edit an existing connection and change the User from a root user to a non-root user, you must also change the Working Directory. This change is necessary because the non-root User cannot access the files that are previously created by the root user in the existing Working Directory.
- Scan Directory
- The root directory of the scan. All files and directories under this directory are scanned. Typically, this directory is the base directory of the file system. For example, /gpfs/fs1.
- Connection Type
- The type of source storage system this connection represents.
- Site
- An optional physical location tag that an administrator can provide to see the physical distribution of their data.
- Cluster
- The IBM Storage Scale or GPFS cluster name. To obtain, run the following command from the IBM Storage Scale file system: /usr/lpp/mmfs/bin/mmlscluster.
- Host
- The hostname or IP address of an IBM Storage Scale node from which a scan can be initiated, for example a quorum-manager node.
- File system
- The short name (omit /dev/) of the file system to be scanned. For example,
fs1.Note: It is important to exactly match the file system name (data source) that IBM Storage Scale populates in the scan file. Run the following command on the IBM Storage Scale system:
/usr/lpp/mmfs/bin/mmlsmount all
- Node list
- The comma-delimited list of nodes or node classes that participates in the scan of an IBM Storage
Scale file system. For example,
scale01,scale02
.Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------- 1 msys111-10g 172.16.8.111 msys111-dmz quorum-manager-perfmon 2 msys112-10g 172.16.8.112 msys112-dmz quorum-manager-perfmon 3 msys113-10g 172.16.8.113 msys113-dmz quorum-manager-perfmon
Note: When you create data source connections for IBM Storage Scale file systems, it is important to exactly match the cluster name and the file system name (data source) that IBM Storage Scale populates in the scan file.Run the following commands on the IBM Storage Scale system.- Run the following command to display information about the GPFS cluster:
$ /usr/lpp/mmfs/bin/mmlscluster GPFS cluster information ======================== GPFS cluster name: modevvm19.tuc.example.com, GPFS cluster id: 7146749509622277333 GPFS UID domain: modevvm19.tuc.example.com Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation ---------------------------------------------------------------------------------------- 1 modevvm19.tuc.example.com 203.0.113.24 modevvm19.tuc.example.com quorum-manager
- Run the following command to display information about file systems that are mounted: on the
nodes:
$ /usr/lpp/mmfs/bin/mmlsmount all File system gpfs0 is mounted on 1 nodes File system Data_Science_8M is mounted on 7 nodes. File system icp4D_data_fs_master1 is mounted on 8 nodes. File system icp4D_data_fs_master2 is mounted on 8 nodes. File system icp4D_data_fs_master3 is mounted on 8 nodes.
- Run the following command to display information about the GPFS cluster: