Migrating DataStage jobs

You can migrate DataStage jobs by creating and importing ISX files that contain the job information. Complete other post-migration tasks where applicable.

Before you migrate, make sure to set up, scale, and provision storage for your DataStage® instance. For more information, see Administering DataStage.

Procedure

Create and import the ISX file

Create and export an ISX file by using one of the methods that are listed in the following table:

Option Instructions
ISTOOL Use ISTOOL to create an ISX file and export the file. For instructions, see Export command for InfoSphere DataStage and QualityStage assets and How to use ISTOOL for EXPORT IMPORT Information Server Components.
MettleCI Use MettleCI, which is a third-party service, to convert a server job design into an equivalent parallel job design, then create an ISX file and export the file to your system. For more information, see MettleCLI docs.
InfoSphere Information Server Manager GUI client Use the Information Server Manager GUI client to export the ISX file. For detailed instructions, see Exporting assets
Note: Make sure that the ISX file export includes any dependencies, such as parameter sets and table definitions. If folder support is enabled, folder structures will be re-created on import.
Note: It is recommended that you scale up your services such as DataStage, Watson™ Pipelines, and to the Large instance size before you import your .ISX file. After migration, less resources are required and you can scale down. If you are experiencing issues even with a Large instance size, you may need to customize your configuration. For more information, see Large ISX imports produce gateway timeout and compilation errors.
Complete the following steps to import the ISX file:
  1. Open an existing project or create a new one.
  2. From the Assets tab of the project, click New asset > Graphical builders > DataStage.
  3. Click the Local file tab, then upload the ISX file from your local computer. Then, click Create.
    Note: The ISX file must exist on your desktop or network drive. Do not drag the file as an attachment from another application.

The asset import report contains status information and error messages that you can use to troubleshoot your ISX import. For information on viewing and using the report to troubleshoot, see Asset import report.

Migrate connections

If your migrated jobs contain connections, see Migrating connections in DataStage for information.

Migrate stages

Table 1. Stages and their migration considerations.
Stages Considerations
  • Distributed Transaction
  • CDC Connector
Migration is not currently supported.
  • BW Extraction Pack
  • SAP IDoc Loader
  • SAP BAPI Pack
  • ABAP_EXT_for_R3_PX
  • Load_PACK_for_BW_PX
Migration is not supported. For jobs that include these connectors, review the job design, then re-create by using new connectors.
The following SAP connectors are available and have more features:
  • SAP Bulk Extract
  • SAP Delta Extract
  • SAP OData
  • SAP HANA
  • XML Input
  • XML Output
  • XML Transformer
See the following topics for considerations:
Custom stages See the following topic for considerations: Uploading the operator library file after you migrate a DataStage flow that contains a custom stage
Java Transformer See the following topic for considerations: Migrating the Java Transformer stage from traditional DataStage
  • Web Service Transformer
  • Web Service Client
See the following topic for considerations: Migrating Web Service Transformer and Web Service Client stages from traditional DataStage
  • Information Services Director Input
  • Information Services Director Output
Automatically converted to the Data service connector.
Data Rules You can use the cpdctl dsjob CLI to migrate a Data Rules stage into DataStage as a Quality Rule. See Enabling the migration of the Data Rules stage as an IBM Knowledge Catalog Quality Rule in DataStage
PxSurrogateKeyGenerator When you migrate a job from traditional DataStage that has this stage, and the stage has both input and output links, the stage is automatically converted to the PxSurrogateKeyGeneratorN type of the Surrogate Key Generator stage. After migration, you must manually create a new surrogate key file.
To manually create a new surrogate key file, see the following procedure with example file names:
  1. Compile the migrated job in modern. An error is produced that says "Could not find key state file: Surrogate_Key_Generator_1." Record the file name for use in the next step.
  2. Create a separate DataStage flow that contains an SKG stage to create a key file. Use the name that you recorded for the value of the Source name field in the properties for the SKG stage in the new flow.
  3. Compile and run the new flow to create the key file Surrogate_Key_Generator_1. Once the key file is created, you can compile and run the migrated job.
Address Verification Install the reference data files in the ds-storage PVC.
Stored procedure Stored procedures are migrated to the corresponding platform connector.

All stored procedures on Db2® type connectors are migrated to the standard Db2 connector, including stored procedures for connectors like Db2 for i and Db2 for z/OS®. Manually replace the Db2 connector with the correct connector type and copy over the stored procedure call.

In the following cases, the procedure is left as-is and must be updated after migration to match the new syntax.
  • Input and output parameters cannot be detected
  • User-defined procedures for SAP ASE
To add the ProcMess column to Microsoft SQL Server, deselect Procedure status to link before migration and select Add procedure return value to schema after migration. For more information, see Using stored procedures.
Exception Exception stages are automatically converted to Peek stages. When a Data Rules stage is migrated as a Quality Rule, the Quality Rule can handle exceptions itself.

Review the parameter sets and PROJDEF values

Review your parameter sets and verify that their default values are correct after migration.

PROJDEF parameter sets are created and updated by migration. If you migrate a job with a PROJDEF parameter set, review the PROJDEF parameter set and specify default values for it. Then, within flows and job runs, any parameter value that is $PROJDEF uses the value from the PROJDEF parameter set.

Update scripts that use the dsjob command line interface

If you have scripts that use dsjob to run jobs, update the script call to dsjob by completing the following steps:
  1. Download cpdctl: https://github.com/IBM/cpdctl/releases/
  2. Create a source shell script (source.sh) to configure cpdctl. Create a text file key.txt for your encryption key. See the following example:
    #!/bin/bash
    export CPDCTL_ENCRYPTION_KEY_PATH=~/key.txt
    export DSJOB_URL=https://example.com
    export DSJOB_ZEN_URL=https://example.com
    export CPDCTL_ENABLE_DSJOB=true
    export CPDCTL_ENABLE_DATASTAGE=true
    export DSJOB_USER=admin
    export DSJOB_PWD=<Password>
    cpdctl config user set dscpserver-user --username $DSJOB_USER --password $DSJOB_PWD
    cpdctl config profile set dscpserver-profile --url $DSJOB_URL
    cpdctl config context set dscpserver-context --user dscpserver-user --profile dscpserver-profile
    cpdctl config context use dscpserver-context
    
    
    cpdctl dsjob list-projects

    Change any references to dsjob to cpdctl dsjob. You might need to adjust the command-line options to fit the DataStage command-line style. See DataStage command-line tools.

Migrate sequence jobs

You can import an ISX file to migrate a sequence job to a pipeline flow. Rewrite expressions in CEL and manually reselect values for some pipeline nodes. See the following topics for more considerations: Orchestrating flows with Watson Pipelines and Migrating and constructing pipeline flows for DataStage. See Migrating BASIC routines in DataStage for information on rewriting BASIC routines as scripts.

Rewrite the routine code for before-job and after-job subroutines

When you migrate before-job and after-job subroutines, the routine code is stored in a .sh script under /ds-storage/projects/<projectName>/scripts/DSU.<RoutineName>.sh. Rewrite the routine code in the same way as a BASIC routine, following the steps in Migrating BASIC routines in DataStage to retrieve the output arguments, but include an exit statement for the before/after-job subroutine. See the following example:
# TODO: Update the following json string and print it as the last line of the standard output.

ErrorCode=0
echo "{\"ErrorCode\":\"$ErrorCode\"}"
exit $ErrorCode

Review the environment variables

APT_CONFIG_FILE

The modern version of DataStage uses dynamic configuration file generation by default. If a migrated flow contains the APT_CONFIG_FILE environment variable, you must review this variable setting and remove it, or create it on /px-storage or other accessible persistent volumes to run the original job. For more information, see Creating and setting the APT_CONFIG_FILE environment variable in DataStage.

APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING

Go to Advanced under the Stage tab of the Transformer stage and select Legacy null processing.

Update data types, data sets, file sets, dsenv files, and user-defined functions

Data types

Data of most types in traditional DataStage are mapped to the same data types in modern DataStage. Data of type Pathname is mapped to type Path.

Data sets and file sets
You can choose between two options:
  • Move the source data sets and file sets that referenced by the source jobs into target clusters in the corresponding locations. For example, you can copy data sets into the ds-storage PVC.
  • Find the original jobs that generated the data sets and file sets, update target location, and rerun those jobs to re-create reference data sets and file sets.
The dsjob command-line utility replaces orchadmin. For more information, see the Data set commands in DataStage command-line tools.
dsenv files
Complete the following steps:
  • Add any special environment variables or parameters that are in the dsenv file into the project runtime environment.
  • Archive and remove the dsenv file.
User-defined functions

If you are migrating a job that contains a parallel routine from traditional DataStage, you must create a function library to enable user-defined functions in the Transformer stage of modern DataStage.

Create the new function library from the existing .so file that the parallel routine points to. Then, configure the library by setting the return data type for each function that you want to use. For more information, see Uploading the library file before you migrate a DataStage flow that contains user-defined functions.