Table of contents

What's new and changed in Data Refinery

The Data Refinery release and subsequent refreshes can include new features, bug fixes, and security updates. Refreshes appear in reverse chronological order, and only the refreshes that contain updates for Data Refinery are shown.

You can see a list of the new features for the platform and all of the services at What's new in IBM Cloud Pak for Data?

Installing or upgrading Data Refinery

Data Refinery is not separately installable. Data Refinery is installed or upgraded when you install or upgrade Watson™ Knowledge Catalog or Watson Studio.

Watson Knowledge Catalog
Related documentation:
Watson Studio
Related documentation:

Refresh 5 of Cloud Pak for Data Version 3.5

A new version of Data Refinery was released in April 2021.

Assembly version: 3.5.4

This release includes the following changes:
Security fixes

This release includes fixes for the following security issues.

CVE-2016-6811, CVE-2017-15713, CVE-2017-15718, CVE-2017-3166, CVE-2018-1296, CVE-2018-14718, CVE-2018-7489, CVE-2018-8029, CVE-2019-16869, CVE-2019-20444, CVE-2019-20445, CVE-2020-10673, CVE-2020-25649, CVE-2020-35490, CVE-2020-35491, CVE-2020-9492, CVE-2021-21290, CVE-2021-21295, CVE-2021-21409, CVE-2021-2163, CVE-2021-23368

Refresh 2 of Cloud Pak for Data Version 3.5

A new version of Data Refinery was released in January 2021.

Assembly version: 3.5.2

This release includes the following changes:

Bug fixes
This release includes the following fix:
  • Issue: Cannot specify format options after you change the Data Refinery flow source. When data is read into Data Refinery, you can scroll down to the SOURCE FILE information at the bottom of the page and click the “Specify data format” icon to specify format options for CSV or delimited files. However, if you changed the source of a Data Refinery flow, this feature is not available.

    Resolution: This problem is fixed.

Initial release of Cloud Pak for Data Version 3.5

A new version of Data Refinery was released as part of Cloud Pak for Data Version 3.5.

Assembly version: 3.5.0

This release includes the following changes:

New features
Use personal credentials for connections
If you create a connection and select the Personal credentials option, other users can use that connection only if they supply their own credentials for the data source.
Users who have credentials for the underlying data source can:
  • Select the connection to create a Data Refinery flow
  • Edit or change a location when modifying a Data Refinery flow
  • Select a data source for the Join operation

For information about creating a project-level connection with personal credentials, see Adding connections to analytics projects.

Use the Union operation to combine rows from two data sets that share the same schema

Union operation

The Union operation is in the ORGANIZE category. For more information, see GUI operations in Data Refinery.

Perform aggregate calculations on multiple columns in Data Refinery
You can now select multiple columns in the Aggregate operation. Previously all aggregate calculations applied to one column.

Aggregate operation

The Aggregate operation is in the ORGANIZE category. For more information, see Aggregate in GUI operations in Data Refinery.

Automatically detect and convert date and timestamp data types
When you open a file in Data Refinery, the Convert column type GUI operation is automatically applied as the first step if it detects any non-string data types in the data. In this release, date and timestamp data are detected and are automatically converted to inferred data types. You can change the automatic conversion for selected columns or undo the step. For information about the supported inferred date and timestamp formats, see the FREQUENTLY USED category in Convert column type in GUI operations in Data Refinery.
Change the decimal and thousands grouping symbols in all applicable columns
When you use the Convert column type GUI operation to detect and convert the data types for all the columns in a data asset, you can now also choose the decimal symbol and the thousands grouping symbol if the data is converted to an Integer data type or to a Decimal data type. Previously you had to select individual columns to specify the symbols.

For more information, see the FREQUENTLY USED category in Convert column type in GUI operations in Data Refinery.

Filter values in a Boolean column
You can now use the following operators in the Filter GUI operation to filter Boolean (logical) data:
  • Is false
  • Is true

Filter operation

For more information, see the FREQUENTLY USED category in Filter in GUI operations in Data Refinery.

In addition, Data Refinery includes a new template for filtering by Boolean values in the filter coding operation:
filter(`<column>`== <logical>)

For more information about the filter templates, see Interactive code templates in Data Refinery.

Data Refinery flows are supported in deployment spaces
You can now promote a Data Refinery flow from a project to a deployment space. Deployment spaces are used to manage a set of related assets in a separate environment from your projects. You can promote Data Refinery flows from multiple projects to a space. You run a job for the Data Refinery flow in the space and then use the shaped output as input for deployment jobs in Watson Machine Learning.

For instructions, see Promote a Data Refinery flow to a space in Managing Data Refinery flows.

Support for TSV files
You can now refine data in files that use the tab-separated-value (TSV) format. TSV files are read-only.
SJIS encoding available for input and output
SJIS (short for Shift JIS or Shift Japanese Industrial Standards) encoding is an encoding for the Japanese language. SJIS encoding is supported only for CSV and delimited files.

You can change the encoding of input files and output files.

To change the encoding of the input file, click the "Specify data format" icon when you open the file in Data Refinery. See Specifying the format of your data in Data Refinery.

To change the encoding of the output (target) file in Data Refinery, open the Information pane and click the Details tab. Click the Edit button. In the DATA REFINERY FLOW OUTPUT pane, click the Edit icon.

New jobs user interface for running and scheduling flows
For more information, see the What's new entry for Watson Studio.
New visualization charts
For more information, see the What's new entry for Watson Studio.