Information icon IBM Information Server, Version 8.1
Feedback

A closer look at WebSphere DataStage

In its simplest form, WebSphere® DataStage™ performs data transformation and movement from source systems to target systems in batch and in real time.

The data sources might include indexed files, sequential files, relational databases, archives, external data sources, enterprise applications, and message queues. Some of the following transformations might be involved:

WebSphere DataStage can also treat the data warehouse as the source system that feeds a data mart as the target system, usually with localized, subset data such as customers, products and geographic territories.

WebSphere DataStage delivers four core capabilities:

Where WebSphere DataStage fits within the IBM Information Server architecture

WebSphere DataStage is composed of client-based design, administration, and operation tools that access a set of server-based data integration capabilities through a common services layer. Figure 1 shows the clients that comprise the WebSphere DataStage user interface layer.

Figure 1. WebSphere DataStage clientsWebSphere DataStage clients

Figure 2 shows the elements that make up the server architecture.

Figure 2. Server architecture
IBM Information Server architecture with Transform highlighted

WebSphere DataStage architecture includes the following components:

Common user interface
The following client applications comprise the WebSphere DataStage user interface:
WebSphere DataStage and QualityStage Designer
A graphical design interface that is used to create WebSphere DataStage applications (known as jobs). Because transformation is an integral part of data quality, the WebSphere DataStage and QualityStage Designer is the design interface for both WebSphere DataStage and WebSphere QualityStage.

Each job specifies the data sources, the required transformations, and the destination of the data. Jobs are compiled to create executables that are scheduled by the WebSphere DataStage and QualityStage Director and run on the WebSphere DataStage server. The Designer client writes development metadata to the dynamic repository while compiled execution data that is required for deployment is written to the WebSphere Metadata Server repository.

WebSphere DataStage and QualityStage Director
A graphical user interface that is used to validate, schedule, run, and monitor WebSphere DataStage job sequences. The Director client views data about jobs in the operational repository and sends project metadata to WebSphere Metadata Server to control the flow of WebSphere DataStage jobs.
WebSphere DataStage and WebSphere QualityStage Administrator
A graphical user interface that is used for administration tasks such as setting up IBM® Information Server users; logging, creating, and moving projects; and setting up criteria for purging records.
Common services
The multiple discrete services of WebSphere DataStage give the flexibility that is needed to configure systems that support increasingly varied user environments and tiered architectures. The common services provides flexible, configurable interconnections among the many parts of the architecture:
  • Metadata services such as impact analysis and search
  • Execution services that support all WebSphere DataStage functions
  • Design services that support development and maintenance of WebSphere DataStage tasks
Common repository
The common repository holds three types of metadata that are required to support WebSphere DataStage:
Project metadata
All the project-level metadata components including WebSphere DataStage jobs, table definitions, built-in stages, reusable subcomponents, and routines are organized into folders.
Operational metadata
The repository holds metadata that describes the operational history of integration process runs, success or failure of jobs, parameters that were used, and the time and date of these events.
Design metadata
The repository holds design time metadata that is created by the WebSphere DataStage and QualityStage Designer and WebSphere Information Analyzer.
Common parallel processing engine
The engine runs executable jobs that extract, transform, and load data in a wide variety of settings. The engine uses parallelism and pipelining to handle high volumes of work more quickly.
Common connectors
The connectors provide connectivity to a large number of external resources and access to the common repository from the processing engine. Any data source that is supported by IBM Information Server can be used as input to or output from a WebSphere DataStage job.
Related concepts
IBM Information Server architecture and concepts

PDF This topic is also in the IBM Information Server Introduction.

Update icon Last updated: 2008-09-15