What's new and changed in Data Refinery

Data Refinery updates can include new features and fixes. Releases are listed in reverse chronological order so that the latest release is at the beginning of the topic.

You can see a list of the new features for the platform and all of the services at What's new in IBM Software Hub.

This release includes the following changes:

New features

This release of Data Refinery includes the following features:

Use new attribute to create Data Refinery flows externally without the UI

You can now use the new shaperAPICreated attribute to create Data Refinery flows programmatically without needing to use the UI. This capability means that you can:

Use external APIs to create Data Refinery flows.
Use third-party integrations to generate flows with shaping operations.
Use automated workflows to create data transformation pipelines.
Use custom applications to build Data Refinery flows without using the UI.

For more information, see the API documentation:<hostname:port_number>/v2/data_flow_spark/docs/swagger/index.html

Create Data Refinery flows in folders: You can now create Data Refinery flows in folders or save existing flows in folders. The information panel shows the folder paths for the flow and the target that you chose. You can also create jobs in folders and modify the flow and target folder paths in the flow settings.
For more information, see Managing Data Refinery flows.

Define parameters for source and target data in Data Refinery flows: A new parameter step is now available in the job creation wizard for Data Refinery flows. You can define parameters for both source and target data so that the same job can be used with different data sets. You can also edit existing jobs to use parameters to define source and target data.
For more information, see Creating jobs in Data Refinery .

Cancel Data Refinery jobs in "starting" state: You can now cancel Data Refinery jobs that are in the Starting state. This enhancement improves job management and resource control.

New connections for Data Refinery

You can now use the following connections with Data Refinery:

Vertica
Microsoft Azure Databricks

For more information, see Supported data sources for Data Refinery.

Updates

The following updates were introduced in this release:

The new Default Spark 3.5 & R 4.3 environment is added.
The Default Spark 3.4 & R 4.3 environment is discontinued.

You can now select Default Spark 3.5 & R 4.3 when you select an environment for a Data Refinery flow job.

If you are upgrading from an earlier product version that has Data Refinery flow jobs that use a discontinued environment, a deprecated environment, or a custom Spark 3.x environment, change the jobs to use the new Default Spark 3.5 & R 4.3 environment. Use the new environment for new jobs

For more information, see Data Refinery environments.

Customer-reported issues fixed in this release

For a list of customer-reported issues that were fixed in this release, see the Fix List for IBM Cloud Pak for Data on the IBM Support website.