Using a third-party data movement application to manage data

Introduction to data movement with IBM Spectrum® Discover.

Before you begin

  • The third-party application must be registered with IBM Spectrum Discover.
  • IBM Spectrum Discover must scan both the source and destination data sources before the data movement.
  • The application must be configured to access the same source and destination data sources.
You can see the third-party application documentation for details.

About this task

The capabilities of IBM Spectrum Discover can be extended by applications; for example, data mover applications can be used to move data between data sources, based on the IBM Spectrum Discover insights. Applications register with IBM Spectrum Discover providing details of the operations they support and the parameters they need to fulfill the operation.

The following table lists the supported data movement operations:

Table 1. Data movement operations
Operation Source Type Destination Type
MOVE

NFS

S3

COS

SMB/CIFS

NFS

S3

COS

SMB/CIFS

COPY

NFS

S3

COS

SMB/CIFS

NFS

S3

COS

SMB/CIFS

TIER

NFS

S3

COS

SMB/CIFS

NFS

S3

COS

SMB/CIFS

Note:

For now, it is not possible for an application to register new operations beyond the records listed in the Data movement operations table.

You can create data movement policies that identify files and objects that are candidates for data operations, by using registered data movement Applications. The IBM Spectrum Discover policy engine generates job request messages in JSON format. It contains a batch of files or objects with the source and target information, and any additional options such as preserving source system timestamps.

The job request messages are put onto a Kafka egress topic. The data movement application reads the messages from the topic and performs the necessary data movement operation, and provides response messages in JSON format on a Kafka ingress topic.

IBM Spectrum Discover can interact with third-party data movement applications to move, copy, or tier data between data sources. Create data management policies on IBM Spectrum Discover, and specify the set of documents to process by using a policy filter. The policy filter can be based on the system metadata or the custom metadata of documents that are collected by IBM Spectrum Discover.

The third-party application registers with IBM Spectrum Discover providing the operations that it supports (move, copy or tier). For each operation, it gives the list of parameter values that it needs to perform the operation.

When you create the data management policy in IBM Spectrum Discover the user defines the filter, the parameter values, and when the policy must run. When the policy runs, IBM Spectrum Discover sends the list of files to the data movement application and any additional parameters. The application processes the files and returns a status summary to IBM Spectrum Discover. The summary is displayed to the user.

Note:
  • During data migration, the migrated files need to preserve the IBM Spectrum Discover tags. You can follow a manual procedure to preserve the tags. For more information, see Preserving tags during data movement.
  • A third-party data movement application can register a schema, defining the parameters it requires for data movement. You need to set these parameters in the application user interface (UI) while creating the policy. Invalid parameters cause an error when you submit a policy.

    However, in few cases, due to the complexity of the schema, the error may not point to the precise location of the problem or may indicate that a valid parameter is invalid. Therefore, when an error is raised when you submit a policy, and it is not definite where the problem is, it is recommended that you check all the policy parameters.

Procedure

  1. Log in to the IBM Spectrum Discover GUI.
  2. Go to Admin > Management Policies.
  3. To create a policy, click Add Policy.
  4. Click the slider control and set the status to one of the following values:
    Active
    An active policy runs whenever its scheduling event is reached.
    Inactive
    An inactive policy does not run even when its scheduling event is reached (including a NOW event).
  5. Enter a policy name.
  6. Enter a policy filter.
    The policy filter includes the criteria for selecting the files for moving or copying. For example, filetype="pdf" selects all files of type PDF.
    Note:
    • IBM Spectrum Discover tracks the last known migration status for each file in the ‘STATE’ facet. This can be leveraged when defining data movement policies to either target or avoid files in a particular state. The following values represent the file status:
      migrtd (migrated)
      File contents are only present on the target system but a stub file exists on the source.
      resdnt (resident)
      File contents are only present on the source system.
    • The following are the scenarios where it is useful to filter the file status by 'STATE' facet are:
      • In a TIER policy where files that are already migrated shouldn't be migrated again.
      • In a COPY policy where files that are migrated shouldn’t be copied as this would result in a recall of the data.
      • In a TIER policy where some files are being recalled in preparation for a workload. Any files that are already resident don’t need to be recalled.
    • To target files that are resident, simply add “state = ‘resdnt’” to the filter criteria. To target files that are NOT resident, simply add “state <> ‘resdnt’” to the filter criteria.
  7. To select the policy type, click Next Step.
  8. Select MOVE, COPY, or TIER as the policy type.
  9. Select the agent name as the Agent.
  10. Enter the remaining parameters. The parameters that are displayed depend on the application, and these parameters might include:
    Source connection type
    Indicates the type of connection that the files currently reside on.
    Source connection
    Indicates the name of the connection that the files currently reside on.
    Destination connection type
    Indicates the type of connection that the files are being moved or copied to.
    Destination connection
    Indicates the name of the connection name that the files are being moved or copied.
    Force migrate
    Indicates whether to force demigration or recall of the file at source location when it is migrated to other location before you perform the operation.
    Overwrite
    Indicates what value to give when a file exists at the destination.
    Preserve attributes, timestamp, or permissions
    Indicates the parameters to control whether the files metadata is preserved.
  11. To enter a schedule, select Next Step.
    The schedule indicates when you want to start the move of the copy.
  12. To review the policy, select Next Step.
  13. To create the policy, select Submit. The policy runs at the scheduled time.
  14. When the policy runs or completes an execution status summary, view it by clicking Policy Preview.