Generating structured synthetic data
Synthetic Data Generator is a powerful tool that is designed to generate data that mimics real-world data. Organizations can use synthetic data to protect sensitive information while still allowing for robust testing, development, and analysis. Synthetic Data Generator helps to support your data privacy and compliance needs.
You can use your existing data to create structured synthetic data by using Synthetic Data Generator. Synthetic Data Generator generates synthetic data that mimics the features and relationships that the real data has.
- Cloud platforms
- Data format
- Tabular: Tables in data files such as .xls, .csv, or .json
- Learn more about Data sources for Synthetic Data Generator.
- Data size
- The Synthetic Data Generator environment can import up to ~2.5GB of data.
What is Synthetic Data Generator?
Synthetic Data Generator is a graphical flow editor tool. You can build Synthetic Data Generator flows to generate structured synthetic data by using the visual interface. Programming is not required.
The Synthetic Data Generator graphical flow editor.

You have the following options for generating data with Synthetic Data Generator:
- Use Synthetic Data Generator to mask and mimic your production data and then generate synthetic tabular data that is based on production data
- Use Synthetic Data Generator to define a custom data schema and then generate synthetic data that is based on your requirements
Building flows
In Synthetic Data Generator, you set up flows to import seed data and generate synthetic data from it. A flow is a series of nodes that you connect on the canvas.
- Flow
- A flow is a group of data-processing operations that are connected in sequences. Flows represent the flow of data through each operation. Data flows from the data source through the sequence of operations to the end. Flows usually start with a node that imports seed data, and they end with a node that exports the synthetic data. Flows are created by adding nodes on to the canvas and connecting them.
- Canvas
- The canvas is the main work area in Synthetic Data Generator, and it is where you build your flows.
- Nodes
- A node is a modular, self-contained set of operations. Nodes are a graphical way of representing these operations, and each node has a unique icon. The nodes are linked together on the canvas in a flow for more complex processing and data generation.
Scripting
You can use scripting in Synthetic Data Generator to automate tasks that are highly repetitive or time consuming to perform manually. Scripts can perform all the same types of actions as users with a mouse or a keyboard, and you can write scripts in Python or Python for Spark.