Manual scanning of an IBM Storage Scale data source

How to configure IBM Data Cataloging to connect to IBM Spectrum Scale. After completing these steps, data can be ingested from an IBM Spectrum Scale data source to IBM Data Cataloging for metadata indexing.

Before you begin

Create the data source connection to IBM Storage Scale. For more information, see Configure data source connections.
You can include or exclude the files during initial IBM Storage Scale scan process by configuring the following environment variable:
INCLUDE_SCALE_SNAPSHOTS
When the INCLUDE_SCALE_SNAPSHOTS variable value is set to 'false' (default value), the IBM Storage Scale scan excludes all the files that are inside the .snapshots directories, otherwise, if the variable value is set to 'true', the scan includes all the files, including the .snapshots directories.

To set the INCLUDE_SCALE_SNAPSHOTS variable by using configmap, see Enabling skip snapshot directories feature on Red Hat® OpenShift®Enabling skip snapshot directories feature on Red Hat® OpenShift in the IBM Storage Scale: Administration Guide.

The minimum connection parameters required for manual scanning are:
  • Connection Name
  • Connection Type
  • Cluster
  • Filesystem
Restriction: IBM Data Cataloging uses a unit separator (ASCII code 0x1F) as the field delimiter for ingestion into the database. This means that data which contains this character in path/file/object names results in improper parsing of the input data and the records are rejected by IBM Data Cataloging.

Procedure

  1. Perform a file system scan to collect system metadata from IBM Spectrum Scale to be ingested into IBM Data Cataloging. For more information, see Performing file system scan to collect metadata from IBM Storage Scale.
  2. Copy the output of the file system scan to the IBM Data Cataloging master node. For more information, see Copying the output of the IBM Storage Scale file system scan to the IBM Data Cataloging master node.
  3. Ingest data from the file system scan in IBM Data Cataloging. For more information, see Ingesting metadata from IBM Storage Scale file system scan in IBM Data Cataloging.
  4. Ingest quota information from the file system. For more information, see Ingesting quota information from the file system.