Import node
In Synthetic Data Generator, you can use the Import node to import source data from databases or files to use as sample seed data.
- Description
- Use the Import node in a Synthetic Data Generator flow to bring in the source data that you can generate synthetic data from. You can import data from a database or file saved as a project asset.
- Using the node
- The Import node is usually the first node in a Synthetic Data Generator flow. Unless you are using a custom schema, you need to import data before you can generate synthetic data.
- It imports datasets that are stored in a structured format, such as a table, and then passes the dataset to the next node for further processing. For more information, see Data sources for Synthetic Data Generator.
- Although you can add several Import nodes to a Synthetic Data Generator flow, you cannot use multiple nodes to combine data.
- Mandatory or optional
- The Import node is mandatory if you're using prodution data as the seed for the synthetic data. If you choose to generate synthetic data from a custom schema, then you only need the Generate node.
Overriding storage type
When Synthetic Data Generator imports data, it processes a sample of the records in the data to infer the structure of the data and the types of data. During this process, Synthetic Data Generator sets the storage type that is required for the data in each field. For example, it might infer that an Age field needs the Real storage type. However, if the inference is incorrect, you can override it.
- In the settings for the Data Asset node, expand the Storage section.
- Select Override for the field that you want to change, and select the new type from the Storage list.
If you want to use scripting to override the storage type, use the following example to add the parameters to your script:
importnode1.setPropertyValue("custom_field_storage", "[[Age, false, Integer], [Sex, false, String], [BP, false, String], [Cholesterol, false, String], [Na, false, Real], [K, false, Real], [Drug, false, String]]")
Scripting with the Import node
You can use scripting languages, like Python, to progammatically set properties for nodes.
Import node properties
The following properties are specific to the Import node. For information about common node properties, see Properties for flows and nodes.
| Property name | Data type | Property description |
|---|---|---|
asset_type |
String | This property specifies the data type. You must specify one of these data types: DataAsset or Connection. |
asset_id |
String | When DataAsset is set for the asset_type, this is the ID of the asset. You must provide an ID if you use a data asset. |
asset_name |
String | When DataAsset is set for the asset_type, this is the name of the asset. |
connection_id |
String | When Connection is set for the asset_type, this is the ID of the database connection. You must provide an ID if you use a connection. |
connection_name |
String | When Connection is set for the asset_type, this is the name of the connection. |
connection_path |
String | When Connection is set for the asset_type, this is the path to the table in the connection. Depending on the database connection, the path includes the catalog and schema, for example catalog_name/schema_name/table_name. |
user_settings |
String | Escaped JSON string containing the interaction properties for the connection, for example: "{\"interactionProperties\":{\"file_format\":\"csv\",\"encoding\":\"UTF-8\",\"first_line_header\":true,\"infer_schema\":true,\"infer_record_count\":1000,\"infer_as_varchar\":false,\"invalid_data_handling\":\"fail\",\"file_name\":\"input.csv\"}}"These values will change based on the type of connection you're using. |
If you need to find the asset_id for a data asset, you can look at the URL when you open the data asset from your projects. For example, in this URL .../data-assets/d8a5a919-kke0-45a1-55hh-6fe5c44321f0/preview?context=wx the asset_id is d8a5a919-kke0-45a1-55hh-6fe5c44321f0.
Example
The following is an example of the properties used in a script.
import json
stream = sdg.script.stream()
importnode = stream.findByID("<import nodeId>")
# loads the string settings as a json object
userSettings = json.loads(dataassetimport.getPropertyValue("user_settings"))
userSettings["interactionProperties"]["sheet_name"] = "<new sheet name>"
importnode.setPropertyValue("user_settings", json.dumps(userSettings))