Anonymize node

In Synthetic Data Generator, you can use the Anonymize node to mask sensitive data by replacing it with an artificial, yet realistic, dataset.

Description: The Anonymize node in a Synthetic Data Generator flow protects sensitive information within the dataset by replacing it with artificial data. The artificial data looks realistic enough to be used in the place of actual data. This helps maintain privacy while preserving the usefulness of the dataset for generating synthetic data.
Using the node: The Anonymize node usually comes after the Import node in the Synthetic Data Generator flow. It takes the source data and anonymizes the columns that you select. The anonymized values are then used for all downstream nodes in place of the actual data.; You only need one Anonymize node in your Synthetic Data Generator flow to mask the source data. You can select any of the available columns in the one node.
Mandatory or optional: The Anonymize node is optional, but it's highly recommended when dealing with sensitive data to ensure privacy compliance. This node might be required if you plan to share or use the dataset outside controlled environments and need to protect individual privacy.

Scripting with the Anonymize node

You can use scripting languages, like Python, to progammatically set properties for nodes.

Anonymize node properties

The following properties are specific to the Anonymize node. For information about common node properties, see Properties for flows and nodes.

All node properties in an Anonymize node are structured properties in addition to their other data types.

Table 1. Node properties for scripting
Property name	Data type	Property description
`enable_anonymize`	Flag	Set to `True` to activate anonymization of field values (equivalent to selecting Yes for that field in the Anonymize Values column).
`use_prefix`	Flag	Set to `True` to use a custom prefix that you specify. Applies to fields that will be anonymized by the Hash method. It is equivalent to choosing the Custom option in the Replace Values settings for that field.
`prefix`	String	The custom prefix that you want to use. It is equivalent to typing a prefix into the text box in the Replace Values settings. The default prefix is the default value if nothing else has been specified.
`transformation`	Random, Fixed	Determines whether the transformation parameters for a field anonymized by the Transform method will be random or fixed.
`set_random_seed`	Flag	When set to `True`, the specified seed value will be used if `transformation` is also set to `Random`.
`random_seed`	Integer	When `set_random_seed` is set to `True`, this is the seed for the random number.
`scale`	Number	When `transformation` is set to `Fixed`, this value is used for "scale by." The maximum scale value is normally 10, but you can reduce it to avoid overflow.
`translate`	Number	When `transformation` is set to `Fixed`, this value is used for "translate." The maximum translate value is normally 1000, but you can reduce it to avoid overflow.

Example

The following is an example of the properties used in a scriipt.

stream = sdg.script.stream()
typenode = stream.findByID("id42KW3MSA94B")

node = stream.createAt("anonymize", "My node", 192, 96)
stream.link(typenode, node)

# Anonymize node requires the input fields while setting the values
node.setKeyedPropertyValue("enable_anonymize", "Age", True)
node.setKeyedPropertyValue("transformation", "Age", "Random")
node.setKeyedPropertyValue("set_random_seed", "Age", True)
node.setKeyedPropertyValue("random_seed", "Age", 123)
node.setKeyedPropertyValue("enable_anonymize", "Drug", True)
node.setKeyedPropertyValue("use_prefix", "Drug", True)
node.setKeyedPropertyValue("prefix", "Drug", "myprefix")