Copying data by using ScaleAFM application

The ScaleAFM application supports copy function between IBM Storage Scale connection and IBM Cloud® Object Storage connection.

Before you begin

  • You must know the IBM Storage Scale versions that are supported. For more information, see IBM Spectrum® Storage software requirements. For more information, see IBM Spectrum Storage software requirements in the Data Cataloging: Concepts, Planning, and Deployment Guide.

  • You need to configure the Active File Management (AFM) functions on IBM Storage Scale filesets under a filesystem. With this configuration, AFM now works as a cache for a bucket or vault on a IBM Cloud Object Storage, or other Cloud Object Storage data source that supports Amazon S3 protocol as a target.

    For more information, see the topic Introduction to AFM to cloud object storage in the IBM Storage Scale Concepts, Planning, and Installation Guide.

  • You need to create the IBM Storage Scale connection in IBM Spectrum Discover by configuring the filesystem in the IBM Storage Scale cluster, which contains the AFM fileset, with the Cloud Object Store target vault that is already configured.
  • IBM Spectrum Discover ScaleAFM application uses the same user that is configured in IBM Spectrum Discover to scan the IBM Storage Scale connection and run the data movement policies. For this data movement, the IBM Storage Scale connection is used as the destination_connection . For more information, see Creating or identifying a user ID and password for scanning. For more information, see Creating or identifying a user ID and password for scanning in the Data Cataloging: Concepts, Planning, and Deployment Guide.

About this task

The IBM Spectrum Discover ScaleAFM application provides the advanced copy function between the IBM Storage Scale connection and the IBM Cloud Object Storage connection (IBM Cloud Object Store or an S3 connection).

The application uses the system metadata and custom metadata of the documents that are collected by IBM Spectrum Discover. The IBM Spectrum Discover collects the metadata for advanced data management function to copy the select set of data from a vault on the IBM Cloud Object Storage connection to an AFM fileset configured on a IBM Storage Scale connection.

The application also provides the cognitive data management capability by analyzing the results of IBM Spectrum Discover's AUTOTAG, Content Search, and Deep Inspect policies.

The AFM to Cloud Object Storage is an IBM Storage Scale feature that enables placement of files or objects in an IBM Storage Scale cluster to a Cloud Object Storage. Cloud object services such as Amazon S3 and IBM Cloud Object Storage offers industry-leading scalability, data availability, security, and performance.

The AFM to Cloud Object Storage allows associating an IBM Storage Scale fileset with a Cloud Object Storage. You can use a Cloud Object Storage to store large amount of data for use cases: mobile applications, backup and restore, enterprise applications, and big data analytics. The data is stored locally on IBM Storage Scale filesets and on a Cloud Object Storage. For more information, see the topic Introduction to AFM to cloud object storage in the IBM Storage Scale Concepts, Planning, and Installation Guide.

Note:

The functions of ScaleAFM application depend on the capability of the underlying storage hardware and its management software, specifically the IBM Storage Scale’s AFM functions for Cloud Object Store.

For more information, see the topic Active File Management for IBM Spectrum Scale in the IBM Spectrum Scale: Administration Guide.

Procedure

  1. Log in to IBM Spectrum Discover GUI.
  2. Go to Administration > Data Management page.
  3. Click Add Policy to create a policy.
  4. Enter a policy name.
  5. Select COPY from the Type list.
  6. Click Proceed to Configure.
  7. Select ScaleAFM from the Application list.
  8. Enter a Policy filter in the Filter field.
    The policy filter includes the criteria for selecting the files for copying from source Cloud Object Store datasource to a destination IBM Storage Scale datasource.
    For example, the policy filter can be like filetype="pdf".
  9. Select the connection type from the Source Connection list.
    It comprises connections that are configured with IBM Spectrum Discover of type Cloud Object Storage or Simple Storage Service (S3). The source connection is the source from where the data is being copied, such as the name of the defined IBM Cloud Object Storage connection.
  10. Select the connection type from the Destination Connection list.
    It comprises connections that are configured with IBM Spectrum Discover of type ‘Spectrum Scale’. The destination connection is the target where the data is being copied, such as the name of the defined IBM Spectrum Scale connection.
    Note: The IBM Storage Scale file system which is configured as the datasource for the destination IBM Storage Scale connection needs to be already configured with a fileset with IBM Storage Scale AFM functions enabled and configured with a Cloud Object Storage vault as the target. This vault must match with the one configured as Source Connection in step 9.
  11. Enter the fileset value in Spectrum scale_afm_fileset field.
    It is configured as the AFM fileset under the destination IBM Storage Scale connection, as explained in step 10.
  12. Enter the directory path in the Target base dir field.
    The Target Base Dir defines the absolute path of a base directory on the destination IBM Storage Scale system to which files from the source system are to be downloaded. It must be a directory path that is expressed as absolute path under the link path of the destination IBM Storage Scale fileset. The field name must begin with a '/'. If not specified, the link path of the IBM Storage Scale AFM fileset that is configured with IBM Storage Scale connection, is used as the target_base_dir.
  13. Click Proceed to Schedule to configure a schedule to the policy.
  14. Click the slider control to select one of the following policy status:
    Active
    An active policy is run whenever its scheduling event is reached.
    Inactive
    An inactive policy is not run even when its scheduling event is reached, including the Now event.
  15. Select a policy schedule under Frequency.
    The schedule indicates when you want to schedule the policy for an execution.
  16. Click Proceed to review to review the policy.
  17. Click Submit to create the policy.
    • When the policy is completed, you can update the IBM Spectrum Discover catalog for the destination IBM Storage Scale connection by manually starting a scan of the connection.
    • If the destination IBM Storage Scale connection is already created enabling live events, then the IBM Spectrum Discover catalog automatically gets updated in a while with the entries relevant to the newly copied files from the Cloud Object Storage datasource by using the ScaleAFM data management policy, recently executed.