Generating structured synthetic data from sample data
In Synthetic Data Generator, you set up flows to import sample seed data and generate synthetic data from it. You can add different nodes to the flow to customize how the sample data is processed before synthetic data is generated.
Prerequisites
Before you can create synthetic data, you need to create a project. For more information, see Creating a project.
Procedure
-
Access Synthetic Data Generator from within a project. Click New asset > Generate synthetic tabular data to create a synthetic data flow.
-
In the Generate synthetic tabular data window, add a name for the asset and click Create. A new session starts and the flow opens. It might take a minutes to create the session.
-
In the Welcome to Synthetic Data Generator dialog, select First time user to open the Generate synthetic tabular data flow window, which helps you set up your first flow.
Optional: If you want to start with a blank canvas in Synthetic Data Generator and set up your own flow, click Experienced user.
-
Select Leverage your existing data, and click Next.
-
In the Generate synthetic tabular data flow dialog box, configure the settings for the new flow:
- In the Import data tab, find and import the data asset in your project that you want to use. Synthetic Data Generator uses this sample data as the basis for the synthetic data. For more information about data assets, see Data sources for Synthetic Data Generator.
- In the Anonymize tab, select the columns with data that you want to mask. This can disguise column names, column values, or both.
- In the Mimic tab, configure parameters for how Synthetic Data Generator generates the synthetic data.
- In the Evaluate tab, configure parameters for how Synthetic Data Generator evaluates the quality of the synthetic data is produces.
- In the Export data tab, choose a file format to save the synethetic data in. For more information, see Exporting synthetic data.
-
On the Review tab, check your settings and click Save flow.
-
To run your new flow immediately, click Run flow.
You can find the synthetic data that generated in your project assets.