Using a third-party data movement application to manage data
Introduction to data movement with IBM Spectrum® Discover.
Before you begin
- The third-party application must be registered with IBM Spectrum Discover.
- IBM Spectrum Discover must scan both the source and destination data sources before the data movement.
- The application must be configured to access the same source and destination data sources.
About this task
The capabilities of IBM Spectrum Discover can be extended by applications; for example, data mover applications can be used to move data between data sources, based on the IBM Spectrum Discover insights. Applications register with IBM Spectrum Discover providing details of the operations they support and the parameters they need to fulfill the operation.
The following table lists the supported data movement operations:
Operation | Source Type | Destination Type |
---|---|---|
MOVE |
NFS S3 COS SMB/CIFS |
NFS S3 COS SMB/CIFS |
COPY |
NFS S3 COS SMB/CIFS |
NFS S3 COS SMB/CIFS |
TIER |
NFS S3 COS SMB/CIFS |
NFS S3 COS SMB/CIFS |
For now, it is not possible for an application to register new operations beyond the records listed in the Data movement operations table.
You can create data movement policies that identify files and objects that are candidates for data operations, by using registered data movement Applications. The IBM Spectrum Discover policy engine generates job request messages in JSON format. It contains a batch of files or objects with the source and target information, and any additional options such as preserving source system timestamps.
The job request messages are put onto a Kafka egress topic. The data movement application reads the messages from the topic and performs the necessary data movement operation, and provides response messages in JSON format on a Kafka ingress topic.
IBM Spectrum Discover can interact with third-party data movement applications to move, copy, or tier data between data sources. Create data management policies on IBM Spectrum Discover, and specify the set of documents to process by using a policy filter. The policy filter can be based on the system metadata or the custom metadata of documents that are collected by IBM Spectrum Discover.
The third-party application registers with IBM Spectrum Discover providing the operations that it supports (move, copy or tier). For each operation, it gives the list of parameter values that it needs to perform the operation.
When you create the data management policy in IBM Spectrum Discover the user defines the filter, the parameter values, and when the policy must run. When the policy runs, IBM Spectrum Discover sends the list of files to the data movement application and any additional parameters. The application processes the files and returns a status summary to IBM Spectrum Discover. The summary is displayed to the user.
- During data migration, the migrated files need to preserve the IBM Spectrum Discover tags. You can follow a manual procedure to preserve the tags. For more information, see Preserving tags during data movement.
-
A third-party data movement application can register a schema, defining the parameters it requires for data movement. You need to set these parameters in the application user interface (UI) while creating the policy. Invalid parameters cause an error when you submit a policy.
However, in few cases, due to the complexity of the schema, the error may not point to the precise location of the problem or may indicate that a valid parameter is invalid. Therefore, when an error is raised when you submit a policy, and it is not definite where the problem is, it is recommended that you check all the policy parameters.