Anonymize node
In Synthetic Data Generator, you can use the Anonymize node to mask sensitive data by replacing it with an artificial, yet realistic, dataset.
- Description
- The Anonymize node in a Synthetic Data Generator flow protects sensitive information within the dataset by replacing it with artificial data. The artificial data looks realistic enough to be used in the place of actual data. This helps maintain privacy while preserving the usefulness of the dataset for generating synthetic data.
- Using the node
- The Anonymize node usually comes after the Import node in the Synthetic Data Generator flow. It takes the source data and anonymizes the columns that you select. The anonymized values are then used for all downstream nodes in place of the actual data.
- You only need one Anonymize node in your Synthetic Data Generator flow to mask the source data. You can select any of the available columns in the one node.
- Mandatory or optional
- The Anonymize node is optional, but it's highly recommended when dealing with sensitive data to ensure privacy compliance. This node might be required if you plan to share or use the dataset outside controlled environments and need to protect individual privacy.
Scripting with the Anonymize node
You can use scripting languages, like Python, to progammatically set properties for nodes.
Anonymize node properties
The following properties are specific to the Anonymize node. For information about common node properties, see Properties for flows and nodes.
All node properties in an Anonymize node are structured properties in addition to their other data types.
| Property name | Data type | Property description |
|---|---|---|
enable_anonymize |
Flag | Set to True to activate anonymization of field values (equivalent to selecting Yes for that field in the Anonymize Values column). |
use_prefix |
Flag | Set to True to use a custom prefix that you specify. Applies to fields that will be anonymized by the Hash method. It is equivalent to choosing the Custom option in the Replace Values settings
for that field. |
prefix |
String | The custom prefix that you want to use. It is equivalent to typing a prefix into the text box in the Replace Values settings. The default prefix is the default value if nothing else has been specified. |
transformation |
Random, Fixed | Determines whether the transformation parameters for a field anonymized by the Transform method will be random or fixed. |
set_random_seed |
Flag | When set to True, the specified seed value will be used if transformation is also set to Random. |
random_seed |
Integer | When set_random_seed is set to True, this is the seed for the random number. |
scale |
Number | When transformation is set to Fixed, this value is used for "scale by." The maximum scale value is normally 10, but you can reduce it to avoid overflow. |
translate |
Number | When transformation is set to Fixed, this value is used for "translate." The maximum translate value is normally 1000, but you can reduce it to avoid overflow. |
Example
The following is an example of the properties used in a scriipt.
stream = sdg.script.stream()
typenode = stream.findByID("id42KW3MSA94B")
node = stream.createAt("anonymize", "My node", 192, 96)
stream.link(typenode, node)
# Anonymize node requires the input fields while setting the values
node.setKeyedPropertyValue("enable_anonymize", "Age", True)
node.setKeyedPropertyValue("transformation", "Age", "Random")
node.setKeyedPropertyValue("set_random_seed", "Age", True)
node.setKeyedPropertyValue("random_seed", "Age", 123)
node.setKeyedPropertyValue("enable_anonymize", "Drug", True)
node.setKeyedPropertyValue("use_prefix", "Drug", True)
node.setKeyedPropertyValue("prefix", "Drug", "myprefix")