Migrating DataStage jobs
You can migrate DataStage jobs by creating and importing ISX files that contain the job information. Complete other post-migration tasks where applicable.
Before you migrate, make sure to set up, scale, and provision storage for your DataStage® instance. For more information, see Administering DataStage.
Procedure
- Create and import the ISX file
- Migrate connections
- Migrate stages
- Review the parameter sets and PROJDEF values
- Update scripts that use the dsjob command line interface
- Migrate sequence jobs
- Rewrite the routine code for before-job and after-job subroutines
- Review the environment variables
- Update data types, data sets, file sets, dsenv files, and user-defined functions
Create and import the ISX file
Create and export an ISX file by using one of the methods that are listed in the following table:
Option | Instructions |
---|---|
ISTOOL | Use ISTOOL to create an ISX file and export the file. For instructions, see Export command for InfoSphere DataStage and QualityStage assets and How to use ISTOOL for EXPORT IMPORT Information Server Components. |
MettleCI | Use MettleCI, which is a third-party service, to convert a server job design into an equivalent parallel job design, then create an ISX file and export the file to your system. For more information, see MettleCLI docs. |
InfoSphere Information Server Manager GUI client | Use the Information Server Manager GUI client to export the ISX file. For detailed instructions, see Exporting assets |
- Open an existing project or create a new one.
- From the Assets tab of the project, click .
- Click the Local file tab, then upload the ISX file from your local
computer. Then, click Create.Note: The ISX file must exist on your desktop or network drive. Do not drag the file as an attachment from another application.
The asset import report contains status information and error messages that you can use to troubleshoot your ISX import. For information on viewing and using the report to troubleshoot, see Asset import report.
Migrate connections
If your migrated jobs contain connections, see Migrating connections in DataStage for information.
Migrate stages
Stages | Considerations |
---|---|
|
Migration is not currently supported. |
|
Migration is not supported. For jobs that include these connectors, review the job design,
then re-create by using new connectors. The following SAP connectors are available and have more features:
|
|
See the following topics for considerations: |
Custom stages | See the following topic for considerations: Uploading the operator library file after you migrate a DataStage flow that contains a custom stage |
Java Transformer | See the following topic for considerations: Migrating the Java Transformer stage from traditional DataStage |
|
See the following topic for considerations: Migrating Web Service Transformer and Web Service Client stages from traditional DataStage |
|
Automatically converted to the Data service connector. |
Data Rules | You can use the cpdctl dsjob CLI to migrate a Data Rules stage into
DataStage as a Quality Rule. See Enabling the migration of the Data Rules stage as an IBM Knowledge Catalog Quality Rule in DataStage |
PxSurrogateKeyGenerator | When you migrate a job from traditional DataStage
that has this stage, and the stage has both input and output links, the stage is automatically
converted to the PxSurrogateKeyGeneratorN type of the Surrogate Key Generator stage. After
migration, you must manually create a new surrogate key file. To manually create a new surrogate
key file, see the following procedure with example file names:
|
Address Verification | Install the reference data files in the ds-storage PVC. |
Stored procedure | Stored procedures are migrated to the corresponding platform connector. All stored procedures on Db2® type connectors are migrated to the standard Db2 connector, including stored procedures for connectors like Db2 for i and Db2 for z/OS®. Manually replace the Db2 connector with the correct connector type and copy over the stored procedure call. In the following cases, the procedure is left as-is and must be
updated after migration to match the new syntax.
|
Exception | Exception stages are automatically converted to Peek stages. When a Data Rules stage is migrated as a Quality Rule, the Quality Rule can handle exceptions itself. |
Review the parameter sets and PROJDEF values
Review your parameter sets and verify that their default values are correct after migration.
PROJDEF parameter sets are created and updated by migration. If you migrate a job with a PROJDEF parameter set, review the PROJDEF parameter set and specify default values for it. Then, within flows and job runs, any parameter value that is $PROJDEF uses the value from the PROJDEF parameter set.
Update scripts that use the dsjob command line interface
- Download cpdctl: https://github.com/IBM/cpdctl/releases/
- Create a source shell script (source.sh) to configure cpdctl. Create a text file
key.txt
for your encryption key. See the following example:#!/bin/bash export CPDCTL_ENCRYPTION_KEY_PATH=~/key.txt export DSJOB_URL=https://example.com export DSJOB_ZEN_URL=https://example.com export CPDCTL_ENABLE_DSJOB=true export CPDCTL_ENABLE_DATASTAGE=true export DSJOB_USER=admin export DSJOB_PWD=<Password> cpdctl config user set dscpserver-user --username $DSJOB_USER --password $DSJOB_PWD cpdctl config profile set dscpserver-profile --url $DSJOB_URL cpdctl config context set dscpserver-context --user dscpserver-user --profile dscpserver-profile cpdctl config context use dscpserver-context cpdctl dsjob list-projects
Change any references to
dsjob
tocpdctl dsjob
. You might need to adjust the command-line options to fit the DataStage command-line style. See DataStage command-line tools.
Migrate sequence jobs
You can import an ISX file to migrate a sequence job to a pipeline flow. Rewrite expressions in CEL and manually reselect values for some pipeline nodes. See the following topics for more considerations: Orchestrating flows with Watson Pipelines and Migrating and constructing pipeline flows for DataStage. See Migrating BASIC routines in DataStage for information on rewriting BASIC routines as scripts.
Rewrite the routine code for before-job and after-job subroutines
When you migrate before-job and after-job subroutines, the routine code is stored in a .sh script under /ds-storage/projects/<projectName>/scripts/DSU.<RoutineName>.sh. Rewrite the routine code in the same way as a BASIC routine, following the steps in Migrating BASIC routines in DataStage to retrieve the output arguments, but include an exit statement for the before/after-job subroutine. See the following example:# TODO: Update the following json string and print it as the last line of the standard output.
ErrorCode=0
echo "{\"ErrorCode\":\"$ErrorCode\"}"
exit $ErrorCode
Review the environment variables
- APT_CONFIG_FILE
-
The modern version of DataStage uses dynamic configuration file generation by default. If a migrated flow contains the APT_CONFIG_FILE environment variable, you must review this variable setting and remove it, or create it on /px-storage or other accessible persistent volumes to run the original job. For more information, see Creating and setting the APT_CONFIG_FILE environment variable in DataStage.
- APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING
-
Go to Advanced under the Stage tab of the Transformer stage and select Legacy null processing.
Update data types, data sets, file sets, dsenv files, and user-defined functions
- Data types
-
Data of most types in traditional DataStage are mapped to the same data types in modern DataStage. Data of type Pathname is mapped to type Path.
- Data sets and file sets
- You can choose between two options:
- Move the source data sets and file sets that referenced by the source jobs into target clusters
in the corresponding locations. For example, you can copy data sets into the
ds-storage
PVC. - Find the original jobs that generated the data sets and file sets, update target location, and rerun those jobs to re-create reference data sets and file sets.
dsjob
command-line utility replacesorchadmin
. For more information, see the Data set commands in DataStage command-line tools. - Move the source data sets and file sets that referenced by the source jobs into target clusters
in the corresponding locations. For example, you can copy data sets into the
- dsenv files
- Complete the following steps:
- Add any special environment variables or parameters that are in the dsenv file into the project runtime environment.
- Archive and remove the dsenv file.
- User-defined functions
-
If you are migrating a job that contains a parallel routine from traditional DataStage, you must create a function library to enable user-defined functions in the Transformer stage of modern DataStage.
Create the new function library from the existing .so file that the parallel routine points to. Then, configure the library by setting the return data type for each function that you want to use. For more information, see Uploading the library file before you migrate a DataStage flow that contains user-defined functions.