Import node

In Synthetic Data Generator, you can use the Import node to import source data from databases or files to use as sample seed data.

Description
Use the Import node in a Synthetic Data Generator flow to bring in the source data that you can generate synthetic data from. You can import data from a database or file saved as a project asset.
Using the node
The Import node is usually the first node in a Synthetic Data Generator flow. Unless you are using a custom schema, you need to import data before you can generate synthetic data.
It imports datasets that are stored in a structured format, such as a table, and then passes the dataset to the next node for further processing. For more information, see Data sources for Synthetic Data Generator.
Although you can add several Import nodes to a Synthetic Data Generator flow, you cannot use multiple nodes to combine data.
Mandatory or optional
The Import node is mandatory if you're using prodution data as the seed for the synthetic data. If you choose to generate synthetic data from a custom schema, then you only need the Generate node.

Overriding storage type

When Synthetic Data Generator imports data, it processes a sample of the records in the data to infer the structure of the data and the types of data. During this process, Synthetic Data Generator sets the storage type that is required for the data in each field. For example, it might infer that an Age field needs the Real storage type. However, if the inference is incorrect, you can override it.

  1. In the settings for the Data Asset node, expand the Storage section.
  2. Select Override for the field that you want to change, and select the new type from the Storage list.

If you want to use scripting to override the storage type, use the following example to add the parameters to your script:

importnode1.setPropertyValue("custom_field_storage", "[[Age, false, Integer], [Sex, false, String], [BP, false, String], [Cholesterol, false, String], [Na, false, Real], [K, false, Real], [Drug, false, String]]")

Scripting with the Import node

You can use scripting languages, like Python, to progammatically set properties for nodes.

Import node properties

The following properties are specific to the Import node. For information about common node properties, see Properties for flows and nodes.

Table 1. Node properties for scripting
Property name Data type Property description
asset_type String This property specifies the data type. You must specify one of these data types: DataAsset or Connection.
asset_id String When DataAsset is set for the asset_type, this is the ID of the asset. You must provide an ID if you use a data asset.
asset_name String When DataAsset is set for the asset_type, this is the name of the asset.
connection_id String When Connection is set for the asset_type, this is the ID of the database connection. You must provide an ID if you use a connection.
connection_name String When Connection is set for the asset_type, this is the name of the connection.
connection_path String When Connection is set for the asset_type, this is the path to the table in the connection. Depending on the database connection, the path includes the catalog and schema, for example catalog_name/schema_name/table_name.
user_settings String Escaped JSON string containing the interaction properties for the connection, for example:
"{\"interactionProperties\":{\"file_format\":\"csv\",\"encoding\":\"UTF-8\",\"first_line_header\":true,\"infer_schema\":true,\"infer_record_count\":1000,\"infer_as_varchar\":false,\"invalid_data_handling\":\"fail\",\"file_name\":\"input.csv\"}}"
These values will change based on the type of connection you're using.
Tip:

If you need to find the asset_id for a data asset, you can look at the URL when you open the data asset from your projects. For example, in this URL .../data-assets/d8a5a919-kke0-45a1-55hh-6fe5c44321f0/preview?context=wx the asset_id is d8a5a919-kke0-45a1-55hh-6fe5c44321f0.

Example

The following is an example of the properties used in a script.

import json

stream = sdg.script.stream()

importnode = stream.findByID("<import nodeId>")
# loads the string settings as a json object
userSettings = json.loads(dataassetimport.getPropertyValue("user_settings"))
userSettings["interactionProperties"]["sheet_name"] = "<new sheet name>"
importnode.setPropertyValue("user_settings", json.dumps(userSettings))