Multi-import node

In Synthetic Data Generator, you can use the Multi-import node to import data from several different sources. You can then generate synthetic data based on all these data sources.

Description
Use the Multi-import node to import several data tables from a database connection to use as sample seed data.
The Multi-import node imports these datasets as a dataset group, where referential integrity is used to define the relationships in the group. You can then use this group of related datasets to generate synthetic data that has similar relationships and distributions as all the original tables that you imported. For more information, see Referential integrity and multi-table nodes.
If you use scripting, you can also import files that are saved as project assets.
Using the node
The Multi-import node is usually the first node in a Synthetic Data Generator flow. It imports datasets that are stored in a structured format, such as a table, and then passes the datasets to the next node for further processing. For more information, see Data sources for Synthetic Data Generator.
Use only one Multi-import node to import multiple tabular datasets. You cannot use multiple nodes to combine data.
If you change the sources or settings in the Multi-import node after the flow runs, rerun the Multi-mimic node before you use Run all to run the flow again. Running the Multi-mimic node refreshes the Multi-gen node.
Mandatory or optional
The Multi-import node is mandatory if you want to use prodution data from multiple data tables as the seed for the synthetic data. If you want to use referential integrity, you cannot generate synthetic data by defining a custom data schema for the synthetic data.

Scripting with the Multi-import node

You can use scripting languages, like Python, to progammatically set properties for nodes.

Note: You can use the Multi-import node to import data from data assets only when using scripting. You cannot select data assets in the user interface.

Multi-import node properties

The following properties are specific to the Multi-import node. For information about common node properties, see Properties for flows and nodes.

Table 1. Node properties for scripting
Property Name Data type Property description
assets Structured property A list of assets, where each asset contains the following: id, name, assettype, connpath, primarykey, usersettings. For details, see Data structure for assets property.
relationships Structured property A list of table relationships, where each relationship includes the following: fromtable, totable, fromcolumns, tocolumns, relationtype. For details, see Data structure for relationships property.

Data structure for assets property

id
The ID for the asset or connection
name
The name of the asset or connection in the project
assettype
The type of asset (DataAsset or Connection)
You can only import data assets through scripting.
connpath
When Connection is set for asset_type, the path to the table within the connection.
Depending on the database connection, the path includes the catalog and schema, for example catalog_name/schema_name/table_name.
primarykey
The primary keys for one or more columns in the source table
You can set several columns in primarykey by using a list of strings, for example, ['ACCOUNT_ID','DISTRICT_ID']. Composite primary keys are not supported.
usersettings
Escaped JSON string of interaction properties.
For example, {"interactionProperties":{"file_format":"csv","encoding":"UTF-8","first_line_header":true,"infer_schema":true,"infer_record_count":1000,"infer_as_varchar":false,"invalid_data_handling":"fail","file_name":"input.csv"}}

Data structure for relationships property

fromtable
The source table with the primary key
Use dot notation for the table name
totable
The destination table that references the primary key
Use dot notation for the table name
fromcolumns
Columns from the source table
tocolumns
Target columns in the destination table
relationtype
Relationship type (ONE_TO_ONE or ONE_TO_MANY)

Table names in dot notation

When you reference tables, you must use dot notation in most cases. The path in dot notation changes depending on if the table has a catalog and schema.

Catalog and schema
assetId.catalog.schema.table
Only schema
assetId.schema.table
No schema
assetId.table
For example, with IBM Cloudant
Data assets
assetId.assetName
For example, for CSV and SAV files

Example script

The following script creates a Multi-import node and sets some properties for it.

stream = sdg.script.stream()
connections = stream.createAt("dataasset_import_multi_table", "Connections", 88, 187)
connections.setPropertyValue("assets", "[['40039389-1b77-47d5-86d1-b37a5e6bf52e', 'DB2 Premium', 'Connection', '/PERF/CATEGORIES', 'CATEGORY_ID', ''], ['40039389-1b77-47d5-86d1-b37a5e6bf52e', 'DB2 Premium', 'Connection', '/PERF/CUSTOMERS', 'CUSTOMER_ID', ''], ['40039389-1b77-47d5-86d1-b37a5e6bf52e', 'DB2 Premium', 'Connection', '/PERF/PRODUCTS', 'PRODUCT_ID', ''], ['40039389-1b77-47d5-86d1-b37a5e6bf52e', 'DB2 Premium', 'Connection', '/PERF/SALES', 'SALE_ID', '']]")
connections.setPropertyValue("relationships", "[['40039389-1b77-47d5-86d1-b37a5e6bf52e.PERF.CATEGORIES', '40039389-1b77-47d5-86d1-b37a5e6bf52e.PERF.PRODUCTS', ['CATEGORY_ID'], ['CATEGORY_ID'] , 'ONE_TO_MANY'], ['40039389-1b77-47d5-86d1-b37a5e6bf52e.PERF.CUSTOMERS', '40039389-1b77-47d5-86d1-b37a5e6bf52e.PERF.SALES', ['CUSTOMER_ID'], ['CUSTOMER_ID'] , 'ONE_TO_MANY'], ['40039389-1b77-47d5-86d1-b37a5e6bf52e.PERF.PRODUCTS', '40039389-1b77-47d5-86d1-b37a5e6bf52e.PERF.SALES', ['PRODUCT_ID'], ['PRODUCT_ID'] , 'ONE_TO_MANY']]")