What's new and changed in DataStage

DataStage® updates can include new features, bug fixes, and security updates. Updates are listed in reverse chronological order so that the latest release is at the beginning of the topic.

You can see a list of the new features for the platform and all of the services at What's new in IBM® Cloud Pak for Data.

Installing or upgrading DataStage

Ready to install or upgrade DataStage?

Related documentation

Refresh 9 of Cloud Pak for Data Version 4.0

A new version of DataStage was released in May 2022.

Operand version: 4.0.9

The 4.0.9 release of DataStage includes the following features and updates:

Connect to more data sources in DataStage
You can now include data from these data sources in your DataStage flows:
  • Generic S3
  • Teradata (optimized)

For the full list of connectors, see DataStage connectors.

Certain connectors now provide a faster way to test and add metadata from their associated connections
When you create the connection, the Test connection button on Add connection page now works for these connections. (Previously, you did not have a way to test the connection in the user interface.)
  • Apache Kafka
  • Db2® (optimized)
  • Netezza® Performance Server (optimized)
  • ODBC
  • Oracle (optimized)
  • Salesforce.com (optimized)
  • Teradata (optimized)

After you create the connection, in DataStage you can drag the Asset browser to the canvas, select a connection and drill down to add or preview the data for these connectors. (Previously, your only option was to drag a connector to the canvas, double-click it to open its Details card, and then go to Properties > Connection and select the connection.)

  • Db2 (optimized)
  • ODBC

For the full list of connectors, see DataStage connectors.

New stages
The following stages are now available for you to use in DataStage flows:
  • Complex Flat File (CFF)
  • Hierarchical stage: REST step
  • Build stage
  • Match Frequency stage
  • One-source Match stage

For more information and the full list of stages, see DataStage stages and QualityStage stages. For more information on the Build stage, see Defining build stages.

Download a DataStage flow and its dependencies as a single file
You can download an individual DataStage flow and its dependencies conveniently bundled together as a ZIP file. You can then import the file into another project.

Dependencies include items such as connections, subflows, and parameter sets.

For details, see Downloading and importing a DataStage flow and its dependencies.

Security fixes
This release includes fixes for the following security issues:
  • CVE-2018-1000876
  • CVE-2019-10086
  • CVE-2019-9923
  • CVE-2020-1751
  • CVE-2020-1752
  • CVE-2020-1757
  • CVE-2020-27782
  • CVE-2020-36518
  • CVE-2021-23840
  • CVE-2021-29469
  • CVE-2021-33503
  • CVE-2021-3583
  • CVE-2021-35942
  • CVE-2021-3711
  • CVE-2021-3712
  • CVE-2021-37322
  • CVE-2021-41771
  • CVE-2021-41772
  • CVE-2021-43138
  • CVE-2021-44716
  • CVE-2022-0778
  • CVE-2022-23772
  • CVE-2022-23773
  • CVE-2022-23806
  • CVE-2022-24785
  • CVE-2022-24921
  • CVE-2022-27191

Refresh 7 of Cloud Pak for Data Version 4.0

A new version of DataStage was released in March 2022.

Operand version: 4.0.7

The 4.0.7 release of DataStage includes the following features and updates:

New connectors
You can now include data from these data sources in your DataStage flows:
  • IBM MQ
  • Microsoft Azure Cosmos DB
  • Microsoft Azure SQL Database

For the full list of DataStage connectors, see DataStage connectors.

Bug fixes
This release includes the following fixes:
  • Issue: When users set 4 partitions in Environments only 3 are dedicated to compute.

    Resolution: This issue is now fixed.

Refresh 6 of Cloud Pak for Data Version 4.0

A new version of DataStage was released in February 2022.

Operand version: 4.0.6

The 4.0.6 release of DataStage includes the following features and updates:

New connectors and stages
You can now include data from these data sources in your DataStage flows:
  • Amazon RDS for Oracle
  • Box
  • Compose for MySQL

For more information, see DataStage connectors.

The following stages are now available for you to use in DataStage flows:

  • Combine Records
  • Make Subrecords
  • Make Vector
  • Promote Subrecords
  • Split Subrecord
  • Split Vector

For more information, see DataStage stages.

Bug fixes
This release includes the following fixes:o
  • Issue: DataStage fails to compile jobs with a large number of transformers.

    Resolution: This issue is now fixed.

  • Issue: DataStage flows that use NLS Locale with Sequential File as a target fail to run.

    Resolution: -collation_sequence is now supported as an export operator argument.

  • Issue: Runtime job metrics are not correct.

    Resolution: This issue is now fixed.

  • Issue: Jobs hang after cluster restarts.

    Resolution: This issue is now fixed.

  • Issue: Runtime is unstable under heavy loads, causing jobs to fail.

    Resolution: This issue is now fixed.

  • Issue: Log file retention policy is not implemented.

    Resolution: This issue is now fixed.

  • Issue: Connectors are unable to write to CWD (permission denied).

    Resolution: This issue is now fixed.

  • Issue: NLS is not supported for compiling transformers.

    Resolution: This issue is now fixed.

  • Issue: Setting partitions is not supported from the Environment definition.

    Resolution: This issue is now fixed.

  • Issue: External Source fails to run source programs.

    Resolution: This issue is now fixed.

  • Issue: Customized standardization rules are not supported in QualityStage.

    Resolution: This issue is now fixed.

  • Issue: RowGen adds an extra space on CPD.

    Resolution: This issue is now fixed.

  • Issue: Vertical pivot regression does not display updates to the SQL.

    Resolution: This issue is now fixed.

  • Issue: An error is produced when the Esc key is pressed while a message ID is being edited in message handling.

    Resolution: This issue is now fixed.

Security fixes
This release includes fixes for the following security issues:
  • CVE-2021-44832

Refresh 5 of Cloud Pak for Data Version 4.0

A new version of DataStage was released in January 2022.

Operand version: 4.0.5
Bug fixes
This release includes the following fixes:
  • Issue: When a backup/restore is run on a cluster with DataStage, the timeout limits, which are set by default, are not sufficient to quiesce the DataStage component pods and put the DataStage custom resource into maintenance mode.

    Resolution: The timeout value is increased to make sure that there is enough time for all DataStage component pods to be quiesced so backup/restore can continue without errors.

Security fixes
This release includes fixes for the following security issues:
  • CVE-2021-45105
  • CVE-2021-45046

Refresh 4 of Cloud Pak for Data Version 4.0

A new version of DataStage was released in December 2021.

Operand version: 4.0.4

The 4.0.4 release of DataStage includes the following features and updates:

New stage
DataStage now supports the Excel stage to read data from or write data to Excel files.

For details, see Excel stage.

New connectors
DataStage now supports the following additional connectors:
  • Amazon RDS for MySQL
  • Databases for MongoDB
  • MariaDB
  • MongoDB

Additionally, the ODBC connection now includes the Text data source.

For details, see DataStage connectors.

Enhancements to the Address Verification (AVI) stage
The AVI stage now includes the following enhancements:
Reverse geocoding
You can provide longitude and latitude coordinates to get an estimated address.
Coding Accuracy Support System (CASS) mode
The AVI stage applies CASS rules as defined by the United States Postal Service® (USPS).

CASS rules are used to correct and standardize addresses. The rules add missing address information, such as ZIP codes, cities, and states, to ensure the address is complete.

For more information on the AVI stage, see Address Verification stage.
Project asset support
You can now use the asset browser connector in DataStage to add files of type .csv, .txt, .xlsx, .json, and .xml directly to the canvas of a DataStage flow. When you add these files in this way, DataStage automatically adds the files to the canvas in the form of the appropriate stage or connector, as shown in the following list:
  • .csv and .txt: Sequential file connector
  • .xlsx: Excel stage
  • .json and .xml: Hierarchical stage
Multiple conductor nodes
You can optionally enable dynamic workload management to use multiple conductor nodes to scale DataStage horizontally. Horizontal scaling provides better workload balancing and availability.

For details, see Enabling multiple conductor (PXRuntime) pods in DataStage.

Bug fixes
  • Issue: Job metrics not populated for sequential file stage.

    Resolution: This issue is now fixed.

  • Issue: Russian messages for px-runtime and ds-runtime not integrated.

    Resolution: This issue is now fixed.

  • Issue: Error messages displayed when WLM is not running.

    Resolution: This issue is now fixed.

  • Issue: Job status is stuck in "starting" state.

    Resolution: This issue is now fixed.

  • Issue: Null pointer exception occurs while runtime environments are being updated.

    Resolution: This issue is now fixed.

  • Issue: Redis errors displayed in ds-runtime service.

    Resolution: This issue is now fixed.

  • Issue: Job start time delay.

    Resolution: This issue is now fixed.

  • Issue: Severe errors in logTail API.

    Resolution: This issue is now fixed.

  • Issue: Support for multiple px-runtime replicas per runtime instance.

    Resolution: This issue is now fixed.

  • Issue: WLM functionality around pod scaling.

    Resolution: This issue is now fixed.

  • Issue: Asset browser functions on Canvas.

    Resolution: This issue is now fixed.

  • Issue: Java integration found in Connectors section.

    Resolution: Java integration is now found in the Stages section.

  • Issue: Issue with browser buttons in Java Integration.

    Resolution: This issue is now fixed.

  • Issue: Editing transformer derivation changes data type to Varchar.

    Resolution: This issue is now fixed.

  • Issue: Group labels are visible with selection.

    Resolution: This issue is now fixed.

  • Issue: Issue with navigation related to skytap.

    Resolution: This issue is now fixed.

  • Issue: UI crashes during Transformer stage with no input columns opened.

    Resolution: This issue is now fixed.

  • Issue: Missing Longitude/Latitude mapping columns.

    Resolution: This issue is now fixed.

  • Issue: Dry run support needed for cloudctl case launch actions: installCatalog, installOperator, uninstallCatalog, uninstallOperator.

    Resolution: Support added.

  • Issue: Unable to select delimiter and input test string for PREP rule test.

    Resolution: This issue is now fixed.

  • Issue: After the pattern is changed, the output stage columns are missing. The columns exist in the Investigate stage.

    Resolution: This issue is now fixed.

  • Issue: After the pattern report is disabled, the output column of the token report is lost.

    Resolution: This issue is now fixed.

  • Issue: rsh / remsh command call in Standardize Operator.

    Resolution: This issue is now fixed.

  • Issue: Support to select ruleset derived output columns in output tearsheet.

    Resolution: This issue is now fixed.

  • Issue: If the column is selected before the rule, the selected rule cannot be displayed.

    Resolution: This issue is now fixed.

  • Issue: Standardize flow fails to run and an error message is displayed.

    Resolution: This issue is now fixed.

  • Issue: Handling option is not disabled for columns.

    Resolution: This issue is now fixed.

  • Issue: After the import process is finished, the CNPhone ruleset is deleted.

    Resolution: This issue is now fixed.

  • Issue: The customized rulesets are displayed but cannot be used.

    Resolution: This issue is now fixed.

  • Issue: COUNTRY ruleset has more handing options in Canvas, but the option is disabled in Legacy.

    Resolution: This issue is now fixed.

  • Issue: The flow compile fails. Expanding the log details leads to an error.

    Resolution: This issue is now fixed.

  • Issue: Literal string is missing a character when a process is edited that has only literals.

    Resolution: This issue is now fixed.

  • Issue: Cannot modify the columns of an existing ruleset.

    Resolution: This issue is now fixed.

  • Issue: Cannot delete a literal operation.

    Resolution: This issue is now fixed.

  • Issue: If the ruleset has "Handling options" and "Handling options" is also selected, the literal column name cannot be modified.

    Resolution: This issue is now fixed.

Security fixes
This release includes fixes for the following security issues:
  • CVE-2021-44228

Refresh 3 of Cloud Pak for Data Version 4.0

A new version of DataStage was released in November 2021.

Operand version: 4.0.3

The 4.0.3 release of DataStage includes the following features and updates:

New stages
DataStage includes new stages, which give you more tools to process your data:
  • Address Verification
  • Hierarchical (XML)
  • Java™ Integration
  • Pivot Enterprise
  • Surrogate Key Generator
For more information, see DataStage stages.
New connectors
DataStage includes new connectors:
  • Google Cloud Pub/Sub
  • MySQL
For more information, see DataStage connectors.
Reusable components
You can create components that you use in projects and in DataStage flows. You create these components in a project, outside of a DataStage flow, which gives you the flexibility to reuse the components in separate places. The components are stored as assets in your project. You can create the following components:
  • Data definitions
  • Parameter sets
  • Subflows
For more information, see the following topics:
Bug fixes
  • Issue: Missing the label of the restore button for other options.

    Resolution: The issue is now fixed.

  • Issue: The stage operator panel cannot be opened for the Investigate and Standardize stages.

    Resolution: The issue is now fixed.

  • Issue: [TVT] Translated text is not shown in the log tab.

    Resolution: This issue is now fixed.

  • Issue: Flows that include word rule set run successfully with a warning message.

    Resolution: This issue is now fixed.

  • Issue: The "output link(s) is undefined" message is not a normal prompt message dialog.

    Resolution: This issue is now fixed.

  • Issue: No error message results from adding a column without inputting a column name.

    Resolution: This issue is now fixed.

  • Issue: The operator does not wait until all pods are ready to set the CR status when the CR name is not datastage.

    Resolution: This issue is now fixed.

  • Issue: The operator does not detect CR changes properly to rerun the playbook.

    Resolution: This issue is now fixed.

  • Issue: Clusters with FIPS enabled cannot run jobs.

    Resolution: This issue is now fixed.

  • Issue: Job crashing does not result in fatal error logged.

    Resolution: This issue is now fixed.

  • Issue: Environment variables set by default can't be unset.

    Resolution: This issue is now fixed.

  • Issue: Logs are not returned via API call when transform compilation fails.

    Resolution: This issue is now fixed.

  • Issue: Runtime is unstable on cluster restart.

    Resolution: This issue is now fixed.

  • Issue: Environment picker issues: environments list not sorted, dropdown not fully visible.

    Resolution: This issue is now fixed.

  • Issue: Removing a Filter > Output column modifies the prior stage Transformer > Output derivation.

    Resolution: This issue is now fixed.

  • Issue: Parameters: the Save tooltip is missing.

    Resolution: This issue is now fixed.

  • Issue: Data definitions: the Save and Import tooltips are missing.

    Resolution: This issue is now fixed.

  • Issue: JDBC: The Insert statement text box is not long enough to show the long insert statement with new lines.

    Resolution: This issue is now fixed.

  • Issue: Canvas throws an error while loading.

    Resolution: This issue is now fixed.

  • Issue: Canvas crashes when a node is deleted and added back to the flow.

    Resolution: This issue is now fixed.

  • Issue: The Save button doesn't work when editing and saving a target stage in a flow.

    Resolution: This issue is now fixed.

  • Issue: Canvas UI displays an error when trying to save the RedShift property.

    Resolution: This issue is now fixed.

  • Issue: ISX file import report - Assets data table layout, spacing, content and component issues.

    Resolution: This issue is now fixed.

  • Issue: Old columns are not cleaned up when a link is deleted and added back.

    Resolution: This issue is now fixed.

  • Issue: Column Export crashes when a column name is input without incoming input columns.

    Resolution: This issue is now fixed.

  • Issue: Parameter set: row [x] remains selected after the row's parameter is deleted.

    Resolution: This issue is now fixed.

  • Issue: Lookup fileset: the page crashes when the Add key button is clicked on the Lookup keys tear sheet.

    Resolution: This issue is now fixed.

  • Issue: Changing the data type of an output column in a previous stage deletes that column in the following Transformer.

    Resolution: This issue is now fixed.

Refresh 2 of Cloud Pak for Data Version 4.0

A new version of DataStage was released in October 2021.

Operand version: 4.0.2
In the 4.0.2 release, the DataStage service been redesigned and modernized.

With DataStage, you can design and run data flows that move and transform data anywhere, at any scale.

No matter how complex your data landscape, DataStage can streamline your data movement costs and increase productivity. DataStage offers:
  • A best-in-breed parallel processing engine that enables you to process your data where it resides
  • Automated job design
  • Simple integration with cloud data lakes, real-time data sources, relational databases, big data, and NoSQL data stores

The DataStage service uses Cloud Pak for Data platform connections and integration points, with services like Data Virtualization, to simplify the process of connecting to and accessing your data.

With DataStage, Data Engineers can use the simple user interface to build no-code/low-code data pipelines. The interface offers hundreds of functions and connectors that reduce development time and inconsistencies across pipelines. The interface also makes it easy to collaborate with your peers and control access to specific analytics projects.

The service also provides automatic workload balancing to provide high performance pipelines that make efficient use of available compute resources.

Refresh 1 of Cloud Pak for Data Version 4.0

A new version of DataStage was released in August 2021.

Operand version: 4.0.

The 4.0.1 release of DataStage includes the following features and updates:
Support for upgrade
You can now upgrade DataStage from the following Cloud Pak for Data releases:
  • Cloud Pak for Data Version 3.5.x
  • Cloud Pak for Data Version 4.0.x
Bug fixes
  • Issue: There's a limit of 50GB of data for the Parquet file format.

    Resolution: 200GB of data is now allowed for the Parquet file format.

  • Issue: When the jobs tab is opened, the users' information is unnecessarily loaded, which blocks the UI.

    Resolution: The issue is resolved with a code fix.

  • Issue: When a job is scheduled and then later deleted, the job run is still triggered. The deletion of the schedule is not taking effect.

    Resolution: This issue was resolved with a code fix.

Initial release of Cloud Pak for Data Version 4.0

Version 4.0.0 of the DataStage Enterprise and DataStage Enterprise Plus services is available on Cloud Pak for Data 4.0. This release of DataStage Enterprise and DataStage Enterprise Plus does not include new features or updates.

Assembly version: 4.0