Importing ETL jobs (Watson Knowledge Catalog)

Import ETL jobs to govern those jobs as data integration assets in catalogs.

You can import ETL jobs starting in Cloud Pak for Data 4.7.2.

This import option is not available in projects that are marked as sensitive.

Before you import metadata, design your metadata import so that you understand all your options and make the appropriate choices for your goals. See Designing metadata imports.

You can also use APIs instead of the user interface to retrieve the list of supported connections or to create a metadata import asset. The links to these APIs are listed in the Learn more section.

Asset types
Data integration assets that represent components of ETL jobs. See Asset types created through metadata import.
Supported connections
See the Other data sources section in Supported connectors.
Required permissions
To create, manage, and run a metadata import, you must have these roles and permissions:
  • The Manage asset discovery user permission.
  • The Admin or the Editor role in the project.
  • The Admin or the Editor role in the catalog to which you want to import the assets.

Prerequisites

Before you can import an ETL job, complete the following prerequisite tasks:

  • For InfoSphere DataStage, Talend, or Informatica PowerCenter ETL jobs, create an ETL job file and upload it to your project. See Preparing ETL job files.
  • For DataStage flows (DataStage on Cloud Pak for Data), select flows from your project.

Creating the metadata import asset and importing data models

To create a metadata import asset and a job for importing ETL jobs to a catalog:

  1. Open a project, go to the project's Asset page and click New asset > Metadata Import.

  2. Select the option Import ETL job and click Next. If you don't see this option, the Advanced metadata import feature is not enabled. See Installed features and license requirements.

  3. Specify a name for the metadata import. Optionally, you can provide a description.

  4. Optional: Select tags to be assigned to the metadata import asset to simplify searching. You can create new tags by entering the tag name and pressing Enter.

  5. For this type of import, the import target can be only a catalog. See Scope of import.

    Select a catalog from the list. If your project is marked as sensitive, you can't create and run data model imports.

  6. Define a scope for the metadata import. See Scope of import.

    1. Click Select file to pick an ETL job file from your project for the import. You can select only one file at a time. See Preparing ETL job files.

      An ETL job file is static. You can't update its content for a later rerun of the metadata import. To work with a new version of the ETL job file, you must create a new metadata import.

    2. After you select an ETL job file, select the data integration tool to make the proper file structure known.

    If you want to add DataStage flows that exist in your project to a catalog, you have these options:

    • To select individual flows, click Select file and select the flows that you want to import. You can select more than one flow for a single metadata import. In Cloud Pak for Data 4.7.0 and 4.7.1, use this option only for DataStage flows that don't have any dependencies other than connections. Starting in Cloud Pak for Data 4.7.2, all DataStage flow dependencies are automatically included in the scope.
    • To select all DataStage flows, you use the Select all DataStage flows and their dependencies in the project option. With this option, you include all DataStage flows in your project in the scope and skip the step for individual selection.
  7. Define whether you want to run scheduled import jobs. If you don't set a schedule, you run the import when you save the metadata import asset. You can rerun the import manually at any time. See Scheduling options.

  8. Optional. Customize the import behavior. You can choose to prevent specific properties from being updated and to delete existing assets that are not included in the reimport. See Advanced import options.

  9. Review the metadata import configuration. To make changes, click the edit (edit icon) icon on the tile and update the settings.

  10. Click Create. The metadata import asset is added to the project, and a metadata import job is created. If you didn't configure a schedule, the import is run immediately. If you configured a schedule, the import runs on the defined schedule.

    Important:

    If the ETL job or set of DataStage flows was already imported through a different metadata import, it is not imported anew but is updated. The data integration assets do no longer show up in the initial metadata import. Only the most recently run metadata import contains the assets.

Depending on the outcome of the metadata import job run, a completion message or an error notification is displayed.

A completion message is displayed when the job run completed successfully, completed with warnings, or completed with errors. An error notification is displayed if the entire job run failed. Either type of notification contains a link to the job run log that provides details about the specific job run.

When the import is complete, you can see the list of assets with the following information:

  • The asset name, which provides a link to the asset in the catalog.
  • The asset type, such as Data integration job.
  • The asset context, such as the file path of the element.
  • The date and time that the asset was last imported.
  • The import status, which can be Imported for successfully imported data, In progress, or Removed if the asset couldn't be reimported.

In the catalog, the imported assets have a tag automatically assigned that reflects the originating data integration tool.

Learn more

Next steps

Parent topic: Importing metadata