Multi-gen node

In Synthetic Data Generator, you use the Multi-gen node to create structured synthetic data from a group of datasets.

Description: The Multi-gen node automatically creates synthetic data by using the statistical properties and relationships in the source datasets. Synthetic Data Generator uses various algorithms and statistical models to analyze these propeties and relationships when the Multi-mimic node runs on the existing datasets. The Multi-gen node then uses this information to generate artificial yet realistic-looking data.
Using the node: The Multi-gen node is used when you have imported data from several sources by using the Multi-import node. For more information, see Referential integrity and multi-table nodes.; You do not need to add a Multi-gen node to the Synthetic Data Generator flow. When you run a flow that has a Multi-mimic node in it, a Multi-gen node is automatically created and added after the Multi-mimic node. On subsequent runs, the existing Multi-gen node is updated.
Mandatory or optional: The Multi-gen node is mandatory. However, you don't add the Multi-gen node to the flow, it is automatically added after a Multi-mimic node runs for the first time.; If you want to use referential integrity to generate synthetic data, you must use production data. You cannot define a custom data schema for the synthetic data.

Scripting with the Multi-gen nodes

You can use scripting languages, like Python, to progammatically set properties for nodes.

Multi-gen node properties

The following properties are specific to the Multi-gen node. For information about common node properties, see Properties for flows and nodes.

Table 1. Node properties for scripting
Property name	Data type	Allowed values	Property description
`ratio`	Float	Range: 0.0 < `value` ≤ 10.0	This property specifies the number of rows to generate for each individual table. It uses the following formula: `value for ratio` × `number of rows in the fitting phase`. A minimum of one record is always generated.
`random_state`	Integer	No specific range. Default value: 929111600	The model needs a random seed to initialize some parameters. You can specify the random seed here. If `None` is specified, the current timestamp is used instead. However, using the timestamp means the model uses a different seed each time that it runs. Setting a fixed seed ensures consistent results.
`custom_sample`	Structured property	See property description	This property sets the number of rows to generate for each table. It contains a list of dictionaries (or arrays), where each entry includes the following: `table_display_name`, `row_count`, `custom_sample`, `table_name`. For details, see Data structure for `n_samples` property.

Data structure for `n_samples` property

table_display_name: The display name of the table.
row_count: The number of rows to generate.
custom_sample: This property controls whether row_count is fixed (true) or scaled by the ratio property (false). This property is mostly used in the user interface. For scripting, set it as true for only the tables where you specify a value for row_count.
table_name: The table name in dot notation.; For an example of the format, see Multi-import node

Example script

The following script finds a Multi-gen node in a Synthetic Data Generator flow and sets some properties for it.

stream = sdg.script.stream()
generateplus = stream.findByType("generateplus", None) 
generateplus.setPropertyValue("ratio", 2.0)
generateplus.setPropertyValue("random_state", 29)
generateplus.setPropertyValue("n_samples", [['PERF.CATEGORIES', 10, 'false', '40039389-1b77-47d5-86d1-b37a5e6bf52e.PERF.CATEGORIES'], ['PERF.CUSTOMERS', 1000, 'false', '40039389-1b77-47d5-86d1-b37a5e6bf52e.PERF.CUSTOMERS'], ['PERF.PRODUCTS', 20, 'false', '40039389-1b77-47d5-86d1-b37a5e6bf52e.PERF.PRODUCTS'], ['PERF.SALES', 4000, 'false', '40039389-1b77-47d5-86d1-b37a5e6bf52e.PERF.SALES']])

Multi-gen node

Scripting with the Multi-gen nodes

Multi-gen node properties

Data structure for n_samples property

Example script

Data structure for `n_samples` property