Transforming data (DataStage)

Use the DataStage® service to transform data to provide enriched and tailored information for your enterprise. DataStage is available as DataStage Enterprise and DataStage Enterprise Plus.

Service This service is not available by default. An administrator must install this service on the IBM® Cloud Pak for Data platform, and you must be given access to the service. To determine whether the service is installed, open the Services catalog and check whether the service is enabled.

Palette and canvas in DataStage

With DataStage, you can create, edit, load, and run transformation jobs. DataStage has features like built-in search, automatic metadata propagation, and simultaneous highlighting of all compilation errors. Developers can use these features to be more productive.

DataStage Enterprise Plus gives you additional useful features for data quality. These features include:
  • Cleansing data by identifying potential anomalies and metadata discrepancies.
  • Identifying duplicates by using data matching and probabilistic matching of data entities between two data sets.
Note: You must use DataStage Enterprise Plus to access the additional data quality functionality.
  • Search: Find what you need fast by using the flexible Search feature.
  • Automatic metadata propagation: DataStage automatically propagates metadata from one stage to the other stages later in the job, increasing productivity.
  • Highlighting of all compilation errors: DataStage highlights all errors and gives you a way to see problems with a quick hover over each stage, so you can fix multiple problems at the same time before recompiling.

DataStage features the following tabs, which you use for quick access to essential actions:

  • Projects
  • Connections
  • Table definitions
  • Jobs
  • Parameter sets

Each of the tabs has the same layout, with common functionality.

  • Items are shown in tile view, with the option for you to select a list view instead.
  • Menu options include Edit, Rename, Clone, and Delete (support varies depending on tab).
  • When you try to delete a connection, table definition, or job, a dialog box shows you where else the item is used so you can decide whether you still want to delete it.
  • You create a live connection or a job by clicking the + Create icon. When the live connection is used on the canvas, you can view the data for an existing table from the Details card.
  • You can navigate jobs by category by selecting Group by > Category from the Jobs tab. This grouping is effectively a "folder view". You can then drill down into each category to lower-level categories. Use the breadcrumb trail to navigate the category levels.
  • You can rename jobs, connections, and table definitions from the corresponding tabs by clicking the vertical ellipsis icon on each object.
  • From the Connections tab, you can import connections.
  • You import table definitions by using command line tools.