What's new and changed in DataStage
DataStage® updates can include new features, bug fixes, and security updates. Updates are listed in reverse chronological order so that the latest release is at the beginning of the topic.
You can see a list of the new features for the platform and all of the services at What's new in IBM® Cloud Pak for Data.
Installing or upgrading DataStage
Ready to install or upgrade DataStage?
- Related documentation
Refresh 9 of Cloud Pak for Data Version 4.0
A new version of DataStage was released in May 2022.
Operand version: 4.0.9The 4.0.9 release of DataStage includes the following features and updates:
- Connect to more data sources in DataStage
- You can now include data from these data sources in your DataStage flows:
- Generic S3
- Teradata (optimized)
For the full list of connectors, see DataStage connectors.
- Certain connectors now provide a faster way to test and add metadata from their associated connections
- When you create the connection, the Test connection button on
Add connection page now works for these connections. (Previously, you did not
have a way to test the connection in the user interface.)
- Apache Kafka
- Db2® (optimized)
- Netezza® Performance Server (optimized)
- ODBC
- Oracle (optimized)
- Salesforce.com (optimized)
- Teradata (optimized)
After you create the connection, in DataStage you can drag the Asset browser to the canvas, select a connection and drill down to add or preview the data for these connectors. (Previously, your only option was to drag a connector to the canvas, double-click it to open its Details card, and then go to Properties > Connection and select the connection.)
- Db2 (optimized)
- ODBC
For the full list of connectors, see DataStage connectors.
- New stages
- The following stages are now available for you to use in DataStage flows:
- Complex Flat File (CFF)
- Hierarchical stage: REST step
- Build stage
- Match Frequency stage
- One-source Match stage
For more information and the full list of stages, see DataStage stages and QualityStage stages. For more information on the Build stage, see Defining build stages.
- Download a DataStage flow and its dependencies as a single file
- You can download an individual DataStage flow and its dependencies conveniently bundled together as a ZIP file. You
can then import the file into another project.
Dependencies include items such as connections, subflows, and parameter sets.
For details, see Downloading and importing a DataStage flow and its dependencies.
- Security fixes
- This release includes fixes for the following security issues:
- CVE-2018-1000876
- CVE-2019-10086
- CVE-2019-9923
- CVE-2020-1751
- CVE-2020-1752
- CVE-2020-1757
- CVE-2020-27782
- CVE-2020-36518
- CVE-2021-23840
- CVE-2021-29469
- CVE-2021-33503
- CVE-2021-3583
- CVE-2021-35942
- CVE-2021-3711
- CVE-2021-3712
- CVE-2021-37322
- CVE-2021-41771
- CVE-2021-41772
- CVE-2021-43138
- CVE-2021-44716
- CVE-2022-0778
- CVE-2022-23772
- CVE-2022-23773
- CVE-2022-23806
- CVE-2022-24785
- CVE-2022-24921
- CVE-2022-27191
Refresh 7 of Cloud Pak for Data Version 4.0
A new version of DataStage was released in March 2022.
Operand version: 4.0.7The 4.0.7 release of DataStage includes the following features and updates:
- New connectors
- You can now include data from these data sources in your DataStage flows:
- IBM MQ
- Microsoft Azure Cosmos DB
- Microsoft Azure SQL Database
For the full list of DataStage connectors, see DataStage connectors.
- Bug fixes
- This release includes the following fixes:
- Issue: When users set 4 partitions in Environments only 3 are dedicated to
compute.
Resolution: This issue is now fixed.
- Issue: When users set 4 partitions in Environments only 3 are dedicated to
compute.
Refresh 6 of Cloud Pak for Data Version 4.0
A new version of DataStage was released in February 2022.
Operand version: 4.0.6The 4.0.6 release of DataStage includes the following features and updates:
- New connectors and stages
- You can now include data from these data sources in your DataStage flows:
- Amazon RDS for Oracle
- Box
- Compose for MySQL
For more information, see DataStage connectors.
The following stages are now available for you to use in DataStage flows:
- Combine Records
- Make Subrecords
- Make Vector
- Promote Subrecords
- Split Subrecord
- Split Vector
For more information, see DataStage stages.
- Bug fixes
- This release includes the following fixes:o
- Issue: DataStage fails to compile
jobs with a large number of transformers.
Resolution: This issue is now fixed.
- Issue: DataStage flows that use
NLS Locale with Sequential File as a target fail to run.
Resolution: -collation_sequence is now supported as an export operator argument.
- Issue: Runtime job metrics are not correct.
Resolution: This issue is now fixed.
- Issue: Jobs hang after cluster restarts.
Resolution: This issue is now fixed.
- Issue: Runtime is unstable under heavy loads, causing jobs to fail.
Resolution: This issue is now fixed.
- Issue: Log file retention policy is not implemented.
Resolution: This issue is now fixed.
- Issue: Connectors are unable to write to CWD (permission denied).
Resolution: This issue is now fixed.
- Issue: NLS is not supported for compiling transformers.
Resolution: This issue is now fixed.
- Issue: Setting partitions is not supported from the Environment definition.
Resolution: This issue is now fixed.
- Issue: External Source fails to run source programs.
Resolution: This issue is now fixed.
- Issue: Customized standardization rules are not supported in QualityStage.
Resolution: This issue is now fixed.
- Issue: RowGen adds an extra space on CPD.
Resolution: This issue is now fixed.
- Issue: Vertical pivot regression does not display updates to the SQL.
Resolution: This issue is now fixed.
- Issue: An error is produced when the Esc key is pressed while a message ID is being
edited in message handling.
Resolution: This issue is now fixed.
- Issue: DataStage fails to compile
jobs with a large number of transformers.
- Security fixes
- This release includes fixes for the following security issues:
- CVE-2021-44832
Refresh 5 of Cloud Pak for Data Version 4.0
A new version of DataStage was released in January 2022.
Operand version: 4.0.5- Bug fixes
- This release includes the following fixes:
- Issue: When a backup/restore is run on a cluster with DataStage, the timeout limits, which are set by
default, are not sufficient to quiesce the DataStage component pods and put the DataStage custom resource into maintenance mode.
Resolution: The timeout value is increased to make sure that there is enough time for all DataStage component pods to be quiesced so backup/restore can continue without errors.
- Issue: When a backup/restore is run on a cluster with DataStage, the timeout limits, which are set by
default, are not sufficient to quiesce the DataStage component pods and put the DataStage custom resource into maintenance mode.
- Security fixes
- This release includes fixes for the following security issues:
- CVE-2021-45105
- CVE-2021-45046
Refresh 4 of Cloud Pak for Data Version 4.0
A new version of DataStage was released in December 2021.
Operand version: 4.0.4The 4.0.4 release of DataStage includes the following features and updates:
- New stage
- DataStage now supports the Excel stage
to read data from or write data to Excel files.
For details, see Excel stage.
- New connectors
- DataStage now supports the following
additional connectors:
- Amazon RDS for MySQL
- Databases for MongoDB
- MariaDB
- MongoDB
Additionally, the ODBC connection now includes the Text data source.
For details, see DataStage connectors.
- Enhancements to the Address Verification (AVI) stage
- The AVI stage now includes the following enhancements:
- Reverse geocoding
- You can provide longitude and latitude coordinates to get an estimated address.
- Coding Accuracy Support System (CASS) mode
- The AVI stage applies CASS rules as defined by the United States Postal Service® (USPS).
CASS rules are used to correct and standardize addresses. The rules add missing address information, such as ZIP codes, cities, and states, to ensure the address is complete.
- Project asset support
- You can now use the asset browser connector in DataStage to add files of type .csv, .txt, .xlsx,
.json, and .xml directly to the canvas of a DataStage flow. When you add these files in this way,
DataStage automatically adds the files to
the canvas in the form of the appropriate stage or connector, as shown in the following list:
- .csv and .txt: Sequential file connector
- .xlsx: Excel stage
- .json and .xml: Hierarchical stage
- Multiple conductor nodes
- You can optionally enable dynamic workload management to use multiple conductor nodes to scale
DataStage horizontally. Horizontal scaling
provides better workload balancing and availability.
For details, see Enabling multiple conductor (PXRuntime) pods in DataStage.
- Bug fixes
-
- Issue: Job metrics not populated for sequential file
stage.
Resolution: This issue is now fixed.
- Issue: Russian messages for px-runtime and ds-runtime not
integrated.
Resolution: This issue is now fixed.
- Issue: Error messages displayed when WLM is not
running.
Resolution: This issue is now fixed.
- Issue: Job status is stuck in "starting" state.
Resolution: This issue is now fixed.
- Issue: Null pointer exception occurs while runtime environments are being
updated.
Resolution: This issue is now fixed.
- Issue: Redis errors displayed in ds-runtime service.
Resolution: This issue is now fixed.
- Issue: Job start time delay.
Resolution: This issue is now fixed.
- Issue: Severe errors in logTail API.
Resolution: This issue is now fixed.
- Issue: Support for multiple px-runtime replicas per runtime
instance.
Resolution: This issue is now fixed.
- Issue: WLM functionality around pod scaling.
Resolution: This issue is now fixed.
- Issue: Asset browser functions on Canvas.
Resolution: This issue is now fixed.
- Issue: Java integration found in Connectors section.
Resolution: Java integration is now found in the Stages section.
- Issue: Issue with browser buttons in Java Integration.
Resolution: This issue is now fixed.
- Issue: Editing transformer derivation changes data type to
Varchar.
Resolution: This issue is now fixed.
- Issue: Group labels are visible with selection.
Resolution: This issue is now fixed.
- Issue: Issue with navigation related to skytap.
Resolution: This issue is now fixed.
- Issue: UI crashes during Transformer stage with no input columns
opened.
Resolution: This issue is now fixed.
- Issue: Missing Longitude/Latitude mapping columns.
Resolution: This issue is now fixed.
- Issue: Dry run support needed for cloudctl case launch actions:
installCatalog, installOperator, uninstallCatalog, uninstallOperator.
Resolution: Support added.
- Issue: Unable to select delimiter and input test string for PREP rule
test.
Resolution: This issue is now fixed.
- Issue: After the pattern is changed, the output stage columns are missing.
The columns exist in the Investigate stage.
Resolution: This issue is now fixed.
- Issue: After the pattern report is disabled, the output column of the token
report is lost.
Resolution: This issue is now fixed.
- Issue: rsh / remsh command call in Standardize
Operator.
Resolution: This issue is now fixed.
- Issue: Support to select ruleset derived output columns in output
tearsheet.
Resolution: This issue is now fixed.
- Issue: If the column is selected before the rule, the selected rule cannot
be displayed.
Resolution: This issue is now fixed.
- Issue: Standardize flow fails to run and an error message is
displayed.
Resolution: This issue is now fixed.
- Issue: Handling option is not disabled for columns.
Resolution: This issue is now fixed.
- Issue: After the import process is finished, the CNPhone ruleset is
deleted.
Resolution: This issue is now fixed.
- Issue: The customized rulesets are displayed but cannot be
used.
Resolution: This issue is now fixed.
- Issue: COUNTRY ruleset has more handing options in Canvas, but the option
is disabled in Legacy.
Resolution: This issue is now fixed.
- Issue: The flow compile fails. Expanding the log details leads to an
error.
Resolution: This issue is now fixed.
- Issue: Literal string is missing a character when a process is edited that
has only literals.
Resolution: This issue is now fixed.
- Issue: Cannot modify the columns of an existing
ruleset.
Resolution: This issue is now fixed.
- Issue: Cannot delete a literal operation.
Resolution: This issue is now fixed.
- Issue: If the ruleset has "Handling options" and "Handling options" is also
selected, the literal column name cannot be modified.
Resolution: This issue is now fixed.
- Issue: Job metrics not populated for sequential file
stage.
- Security fixes
- This release includes fixes for the following security issues:
- CVE-2021-44228
Refresh 3 of Cloud Pak for Data Version 4.0
A new version of DataStage was released in November 2021.
Operand version: 4.0.3The 4.0.3 release of DataStage includes the following features and updates:
- New stages
- DataStage includes new stages, which
give you more tools to process your data:
- Address Verification
- Hierarchical (XML)
- Java™ Integration
- Pivot Enterprise
- Surrogate Key Generator
- New connectors
- DataStage includes new connectors:
- Google Cloud Pub/Sub
- MySQL
- Reusable components
- You can create components that you use in projects and in DataStage flows. You create these components in a
project, outside of a DataStage flow, which gives you
the flexibility to reuse the components in separate places. The components are stored as assets in
your project. You can create the following components:
- Data definitions
- Parameter sets
- Subflows
- Bug fixes
-
- Issue: Missing the label of the restore button for other options.
Resolution: The issue is now fixed.
- Issue: The stage operator panel cannot be opened for the Investigate and Standardize
stages.
Resolution: The issue is now fixed.
- Issue: [TVT] Translated text is not shown in the log tab.
Resolution: This issue is now fixed.
- Issue: Flows that include word rule set run successfully with a warning
message.
Resolution: This issue is now fixed.
- Issue: The "output link(s) is undefined" message is not a normal prompt message dialog.
Resolution: This issue is now fixed.
- Issue: No error message results from adding a column without inputting a column
name.
Resolution: This issue is now fixed.
- Issue: The operator does not wait until all pods are ready to set the CR status when the
CR name is not
datastage
.Resolution: This issue is now fixed.
- Issue: The operator does not detect CR changes properly to rerun the
playbook.
Resolution: This issue is now fixed.
- Issue: Clusters with FIPS enabled cannot run jobs.
Resolution: This issue is now fixed.
- Issue: Job crashing does not result in fatal error
logged.
Resolution: This issue is now fixed.
- Issue:
Environment variables set by default can't be unset.
Resolution: This issue is now fixed.
- Issue: Logs are not returned via API call when transform compilation
fails.
Resolution: This issue is now fixed.
- Issue: Runtime is unstable on cluster restart.
Resolution: This issue is now fixed.
- Issue: Environment picker issues: environments list not sorted, dropdown
not fully visible.
Resolution: This issue is now fixed.
- Issue: Removing a Filter > Output column modifies the prior stage
Transformer > Output derivation.
Resolution: This issue is now fixed.
- Issue: Parameters: the Save tooltip is missing.
Resolution: This issue is now fixed.
- Issue: Data definitions: the Save and Import tooltips are missing.
Resolution: This issue is now fixed.
- Issue: JDBC: The Insert statement text box is not long enough to show the
long insert statement with new lines.
Resolution: This issue is now fixed.
- Issue: Canvas throws an error while loading.
Resolution: This issue is now fixed.
- Issue: Canvas crashes when a node is deleted and added back to the
flow.
Resolution: This issue is now fixed.
- Issue: The Save button doesn't work when editing and saving a target stage
in a flow.
Resolution: This issue is now fixed.
- Issue: Canvas UI displays an error when trying to save the RedShift
property.
Resolution: This issue is now fixed.
- Issue: ISX file import report - Assets data table layout, spacing, content
and component issues.
Resolution: This issue is now fixed.
- Issue: Old columns are not cleaned up when a link is deleted and added
back.
Resolution: This issue is now fixed.
- Issue: Column Export crashes when a column name is input without incoming
input columns.
Resolution: This issue is now fixed.
- Issue: Parameter set: row [x] remains selected after the row's parameter is
deleted.
Resolution: This issue is now fixed.
- Issue: Lookup fileset: the page crashes when the Add key button is clicked
on the Lookup keys tear sheet.
Resolution: This issue is now fixed.
- Issue: Changing the data type of an output column in a previous stage
deletes that column in the following Transformer.
Resolution: This issue is now fixed.
Refresh 2 of Cloud Pak for Data Version 4.0
A new version of DataStage was released in October 2021.
Operand version: 4.0.2With DataStage, you can design and run data flows that move and transform data anywhere, at any scale.
- A best-in-breed parallel processing engine that enables you to process your data where it resides
- Automated job design
- Simple integration with cloud data lakes, real-time data sources, relational databases, big data, and NoSQL data stores
The DataStage service uses Cloud Pak for Data platform connections and integration points, with services like Data Virtualization, to simplify the process of connecting to and accessing your data.
With DataStage, Data Engineers can use the simple user interface to build no-code/low-code data pipelines. The interface offers hundreds of functions and connectors that reduce development time and inconsistencies across pipelines. The interface also makes it easy to collaborate with your peers and control access to specific analytics projects.
The service also provides automatic workload balancing to provide high performance pipelines that make efficient use of available compute resources.
Refresh 1 of Cloud Pak for Data Version 4.0
A new version of DataStage was released in August 2021.
Operand version: 4.0.
- Support for upgrade
- You can now upgrade DataStage from the
following Cloud Pak for Data releases:
- Cloud Pak for Data Version 3.5.x
- Cloud Pak for Data Version 4.0.x
- Bug fixes
-
- Issue: There's a limit of 50GB of data for the Parquet file format.
Resolution: 200GB of data is now allowed for the Parquet file format.
- Issue: When the jobs tab is opened, the users' information is unnecessarily loaded, which
blocks the UI.
Resolution: The issue is resolved with a code fix.
- Issue: When a job is scheduled and then later deleted, the job run is still triggered.
The deletion of the schedule is not taking effect.
Resolution: This issue was resolved with a code fix.
- Issue: There's a limit of 50GB of data for the Parquet file format.
Initial release of Cloud Pak for Data Version 4.0
Assembly version: 4.0