Connecting data to IBM Master Data Management
Use the IBM Master Data Management connection to connect your entity, record, and relationship data to data assets, catalogs, and DataStage flows.
By using connected data, you can transform, refine, or analyze data before you bring it into the IBM Master Data Management service, or you can export master data from IBM Master Data Management to other services for refinement, analysis, visualization, and governance.
For more information about creating an IBM Master Data Management connection, see IBM Master Data Management connection.
Watch the following video to see how to load and export master data between IBM Master Data Management and connected databases.
This video provides a visual method to learn the concepts and tasks in this documentation.
Using the IBM Master Data Management connection
You can use connected IBM Master Data Management data in the following workspaces and tools:
Projects
- DataStage (DataStage service). See Connecting to a data source in DataStage.
Catalogs
- Platform assets catalog
- Other catalogs (IBM watsonx.data intelligence)
If you are working with a governed catalog, you can only view or add catalog assets of which you are the data asset owner.
Example IBM Master Data Management connected data scenarios
The following sections provide details about how to achieve some common scenarios for connecting master data.
- Importing data into IBM Master Data Management by using DataStage and a database connection
- Exporting master data by using DataStage and the IBM Master Data Management connection
- Exporting connected master data assets to be used in Cloud Pak for Data assets and catalogs
Importing data into IBM Master Data Management by using DataStage and a database connection
To complete this task, you must have:
- A working source database and its connection credentials
- A working IBM Master Data Management service instance that is fully set up
- DataStage
To import data into IBM Master Data Management by using DataStage and a database connection:
-
Create your connection assets:
- Source database connection: A connection to the source database using any connection supported by Cloud Pak for Data
- Target connection: An IBM Master Data Management connection
- Write mode: Define whether to load the data in bulk (Load) or in small chunks on an ongoing basis (Ongoing synchronization). The ongoing synchronization option is useful for Delta-load scenarios.
For details about creating these connections, see Connectors and its subtopics.
-
Set up a DataStage flow to move data into IBM Master Data Management. For example: [Source database connection] node -> COPY stage -> [IBM Master Data Management connection] node.
-
Link all three nodes together.
-
Set up source properties such as connection, schema name, and table name.
-
Select the data columns you want to move.
a. In the source database node's detail panel, click Edit > Import data, and select the columns you want to move into IBM Master Data Management. You can optionally select a subset of the columns.
b. For any LONGVARCHAR data types, change them to the VARCHAR data type and change the length to 200.
c. Click Apply.
-
Map columns to the IBM Master Data Management data types.
a. Select the COPY stage.
b. Go to the Input tab and confirm that it lists the columns you selected in the previous step.
c. Go to the Output tab, then click Edit.
d. Map the current data column names to your IBM Master Data Management data types.
- Select and delete all of the output columns. Click Import data, then find the corresponding data columns from within the IBM Master Data Management connection.
- Click Import to save the changes.
- Back on the Output tab, click Edit again, then use the
Map from input columncolumn to map the IBM Master Data Management column names to the source column names. - Click Apply and return.
-
Set up target properties.
a. Click the IBM Master Data Management connection tab.
b. Choose the IBM Master Data Management connection, and then define the data category, data type (record type), and entity type, such as
record,person, andperson_record. Use names that exactly match your IBM Master Data Management data types.c. Click Save.
-
Save, compile, and run the DataStage flow. After you see a success notification, the data is loaded into IBM Master Data Management and is now ready to match.
Exporting master data by using DataStage and the IBM Master Data Management connection
To complete this task, you must have:
- A working source database and its connection credentials
- A working IBM Master Data Management service instance that is fully set up
- DataStage
To export master data by using DataStage and the IBM Master Data Management connection:
-
Create your connection assets:
- Source connection: An IBM Master Data Management connection
- Target database connection: A connection to the source database using any connection supported by Cloud Pak for Data
- Export job ID (optional): The job ID of a completed IBM Master Data Management export job that you want to extract data from. If a value is provided for this field, the other input fields are ignored and the service exports the data associated with the provided export ID.
- Filter rule (optional): JSON-formatted search criteria that defines which record or entity data is included in the export. If no filter rule is provided, all of the data gets exported. Filter rules can contain a nested
combination of expressions. Each expression contains a property, a condition, and a value. For example, use the following JSON payload to define a filter that exports record or entity data that includes a value for personal email and the Legal Given Name starts with an
Sor the Legal Last Name starts with anM. Additionally, as defined by theincludeDeletesflag, this example export includes information about any entity data that was deleted after the defined last updated date (entity_last_updated):
{ "query": { "expressions": [ { "operation": "or", "expressions": [ { "property": "legal_name.given_name", "condition": "starts_with", "value": "S" }, { "property": "legal_name.last_name", "condition": "starts_with", "value": "M" } ] }, { "property": "personal_email.email_id", "condition": "has_value", "value": null }, { "property": "entity_last_updated", "condition": "greater_than_equal", "value": "0", "expressions": [] } ], "operation": "and" }, "includeDeletes": true }For more information about creating these connections, see Connectors and its subtopics.
-
Set up a DataStage flow, such as [IBM Master Data Management connection] node -> [Target database connection] node. Unlike with an import flow, you don't need a COPY stage because mapping data columns is not necessary.
-
Set up your source properties.
a. Select the IBM Master Data Management connection node.
b. In the Stage tab, configure the IBM Master Data Management database properties, define the data category, data type (record type), and entity type for the data that you want to export, such as
record,person, andperson_record, using names that exactly match your IBM Master Data Management data types.c. Click Save. In the Output tab, browse and select the columns you want to move. Select the IBM Master Data Management connection, then select the necessary data from the data types.
-
Set up your target properties.
a. In the target database node, define how you want the data to be applied to the database, such as insert, update, or merge. Then define the schema name and table name.
b. Click Save.
-
Save, compile, and run the DataStage flow. After you see a success notification, the data is loaded into the target database.
Exporting connected master data assets to be used in Cloud Pak for Data assets and catalogs
To complete this task, you must have:
- A working IBM Master Data Management service instance that is fully set up.
To export master data assets to be used in Cloud Pak for Data assets and catalogs:
-
Create an IBM Master Data Management connection.
a. In the Project > Assets action bar, click New asset > Connect to a data source.
b. Choose the IBM Master Data Management connection.
c. Provide a name a description for this connection, then provide the remaining connection details of of the IBM Master Data Management service instance that you want to connect to. For more information about creating a connection, see IBM Master Data Management connection.
-
Create an IBM Master Data Management connected data asset.
a. In the Project > Assets action bar, click Import asset > Connected data.
c. Click Select source, then choose a connection source from the available databases.
d. Choose your IBM Master Data Management connection, then select the data types that you want to include in this connected data asset.
e. Provide a name and description for your connected data asset, then click Create. You can now see the new connected data asset in your project's Assets tab.
-
Publish the data asset to a catalog.
a. From the project's Assets tab, find your data asset, then select Publish to catalog from the asset's overflow menu.
b. Choose the catalog that you want to publish to, give the job a description, and configure duplicate action behavior and privacy.
c. Optionally, add tags to help you organize and find your data, then click Publish. The data is now available in the catalog. For more information about working with catalog assets, see Catalog asssets.