Generating synthetic data

Synthetic Data Generator is a powerful tool that is designed to generate data that mimics real-world data. Organizations can use synthetic data to protect sensitive information while still allowing for robust testing, development, and analysis. Synthetic Data Generator helps to support your data privacy and compliance needs.

You have the following options for generating data with Synthetic Data Generator:

  • Use Synthetic Data Generator to mask and mimic your production data and then generate synthetic tabular data that is based on production data
  • Use Synthetic Data Generator to define a custom data schema and then generate synthetic data that is based on your requirements
  • Create a Synthetic Data Generator job to generate unstructured synthetic data that is based on sample data
  • Use the watsonx.ai synthetic data generation API to generate unstructured synthetic data that is based on a sample
Data format for structured data
Tabular: Tables in data files such as .xls, .csv, or .json
Learn more about Data sources for Synthetic Data Generator.
Data format for unstructured data
Seed data files such as .yaml, .pdf, and .md
Learn more about Data builder pipelines and seed data.
Data size for structured data
The Synthetic Data Generator environment can import up to ~2.5GB of data.
Data size for unstructured data
Data can be any size

The Synthetic Data Generator graphical flow editor. Synthetic Data Generator overview

Learn more