Transforming data (DataStage)

Use the DataStage® service to transform data to provide enriched and tailored information for your enterprise. DataStage is available as DataStage Enterprise and DataStage Enterprise Plus.

ServiceThis service is not available by default. An administrator must install this service on the IBM Cloud Pak® for Data platform, and you must be given access to the service. To determine whether the service is installed, open the Services catalog in your Cloud Pak for Data environment and check whether the service is enabled.

Palette and canvas in DataStage

DataStage is a data integration tool that moves and transforms data between operational, transactional, and analytical target systems. Data integration specialists use DataStage to develop flows that process and transform data. Hundreds of prebuilt transformation functions, parallel processing capabilities, and platform connectivity is available to connect directly to enterprise applications, cloud data sources, relational and NoSQL systems, REST endpoints, and more. You can administer, manage, deploy, and reuse these flows to integrate data across many systems throughout your organization.

Data format
Tabular: Avro, CSV, JSON, Parquet, TSV (read only), or delimited text files
Data size
Any
Required services
DataStage
Connectors
Example connectors include: Db2®, Netezza® Performance Server, Microsoft SQL Server, Oracle,Teradata, Snowflake, Microsoft Azure File Storage, Amazon Web Services and Google Cloud Platform services, and Amazon S3.

See DataStage connectors for the list of connectors that DataStage supports.

Stages
This service provides stages, which describe a particular process such as accessing a database or transforming data in some way. DataStage stages provide common functions for moving and transforming data. QualityStage stages are important for, but not limited to, eliminating redundant, obsolete, or inaccurate data, standardizing data, and verifying address data.

See DataStage stages and QualityStage stages for information on the stages that DataStage supports.

DataStage Enterprise With DataStage, you can create, edit, load, and run transformation jobs. DataStage has features like built-in search, automatic metadata propagation, and simultaneous highlighting of all compilation errors. Developers can use these features to be more productive.

  • DataStage provides an interactive user experience that you use to design flows. Use the parallel processing of DataStage in your data extraction and transformation applications.
  • Automatic metadata propagation: DataStage automatically propagates metadata from one stage to the other stages later in the flow, increasing productivity.
  • Highlighting of all compilation errors: DataStage highlights all errors and gives you a way to see problems with a quick hover over each stage, so you can fix multiple problems at the same time before you recompile.
DataStage Enterprise Plus gives you additional useful features for data quality. These features include:
  • Cleansing data by identifying potential anomalies and metadata discrepancies.
  • Identifying duplicates by using data matching and probabilistic matching of data entities between two data sets.

For more information, see QualityStage stages.