Performing file system scan to collect metadata from IBM Spectrum Scale
You can use the file system scanning tool, IBM Spectrum Scale Scanner, to collect system metadata from IBM Spectrum Scale to be ingested into IBM Spectrum® Discover.
About this task
|The site where the file or object resides.
|The source storage platform that contains the file or object.
|The size of the file.
|The owner of the file.
|The subdirectory where the data resides.
|The name of the data.
|The permissions for the file (mode).
|The change time of the file metadata (inode).
|The time when the data was last modified.
|The time when the data was last accessed.
|The name of the IBM Spectrum Scale file system that is storing the data.
|The name of the IBM Spectrum Scale cluster.
|The IBM Spectrum Scale inode that is storing the data.
|The Linux® group associated with the file.
|The file set that stores the file.
|The storage pool where the file resides.
|If applicable, indicates whether the data is migrated to tape or object.
|If applicable, indicates the location of the data if migrated to tape or object.
|Scan generation - useful to track rescans.
The IBM Spectrum Scale Scanner tool also collects quota information by calling mmrepquota.
- scale_scanner.py: The tool that starts the IBM Spectrum Scale ILM policy.
- scale_scanner.conf: The configuration file used to customize the behavior of the scale_scanner.py tool.
- createScanPolicy: The script that is called internally by the tool.
Install the IBM Spectrum Scale Scanner tool by unpacking the utility from the IBM Spectrum Discover node to the required location on the IBM Spectrum Scale cluster node.
Log in to the IBM Spectrum
Discover node through Secure
Shell (SSH) with the moadmin username and password:
Change to the directory that contains the IBM Spectrum
Scale scanning utility
Copy the createScanPolicy,
scale_scanner.conf, and scale_scanner.py files to a node
in the IBM Spectrum
scp * firstname.lastname@example.org:/my_scanner_directory
createScanPolicy 100% 3320 3.2KB/s 00:00 init.py 100% 427 0.4KB/s 00:00 scale_scanner.conf 100% 1595 1.6KB/s 00:00 scale_scanner.py 100% 13KB 13.2KB/s 00:00
On the IBM Spectrum
Scale node where you install the scanning utility,
edit the configuration file (scale_scanner.conf) as follows:
Use the IBM Spectrum
Discover UI to create a connection to the SS system on
which you start a manual scan for. Set the
scandirfields, and optionally set the
sitefields in the [spectrumscale] stanza of the file.
[spectrumscale] # Spectrum Scale Filesystem which hosts the scan directory # example: /dev/gpfs0 filesystem=/dev/gpfs0 # The directory path on Spectrum Scale Filesystem to perform scan on # example: /gpfs0 # specifies a global directory to be used for temporary storage during # mmapplypolicy command processing. The specified directory must be #mounted with read/write access within a shared file system mountpoint=mount point of the gpfs filesystem # It is unclear what the mount_point should be, but setting the mount point # to the mount point of the scale file system on the IBM Spectrum Scale node works. scandir=/gpfs0 # The directory to store output data from the scan in (default is # scandir) outputdir= # The site tag to specify a physical location or organization identifier. # If you use this field, remove the comment (#) #site=
usernamefields in the
[spectrumdiscover]stanza of the file.Note:
scale_connectionrefers to the name of the IBM Spectrum Scale file system that is scanned and ingested into IBM Spectrum Discover. The
scale_connectionvalue must match the value that is defined in the
Data Sourcecolumn of the Data Connections page in the IBM Spectrum Discover GUI.
The username must be a valid name of the IBM Spectrum Discover user who has the
dataadminrole. The username field takes the format of <domain_name>/<username>. To determine a domain and username with the
dataadminrole, go to the Access Users page in the IBM Spectrum Discover GUI and click the view for the defined users.
For the local domain, it is not necessary to specify the domain as part of the username field as it is the default domain. For example, to define username for user1 in the local domain that is assigned the.
dataadminrole, in the configuration file, enter the following value:
[spectrumdiscover] # Name of the Spectrum Scale connection to scan files from # Check using the Spectrum Discover connection manager APIs scale_connection=fs3 # Spectrum Discover Master Node IP master_node_ip=203.0.113.23 # Spectrum Discover user name, having 'dataadmin' role # Use format <domain_name>/<username> # e.g. username=Scale/scaleuser1 username=user1Note: The scanner output file generates approximately 1 K of metadata for every file in the system. If there are 12 M files, the size is expected to be approximately 12 GB. By default, the output file is written to the same directory that is being scanned. The log file output location can be customized by setting the
- Use the IBM Spectrum Discover UI to create a connection to the SS system on which you start a manual scan for. Set the
- Run the scan by using the following command:
./scale_scanner.pyNote: While you run the ./scale_scanner.py command, you can start another scan. If you start another scan, ensure that you run the scan with another connection that is online and is not being scanned currently. When the scanner is running, the scanner hides the scan now automatically.Note: As you run the scale_scanner.py script, you are prompted for the password for the IBM Spectrum Discover user that is configured in the scale_scanner.conf file with the username under the
spectrumdiscoversection. You must provide the correct password for the configured user. As described in the configuration file, this user needs to be a valid user configured in the IBM Spectrum Discover Authentication service (Access management). Also, this user must be assigned to the
$ ./scale_scanner.py Enter password for SD user 'user1': Scale Scan Policy is created at: ./scanScale.policyNote:
- After you see a line similar to “0 ‘skipped’ files and/or errors”, press enter to return to the command prompt.
- The scan takes approximately 2 minutes 30 seconds for every 10 M files on the following
x86 –based Spectrum Scale Cluster •4 M4 NSD client nodes •2 M4 NSD server nodes •DCS3700 350 2TB NL SAS drives & 20 200GB SSD •QDR InfiniBand cluster network