Optimizing dataset attribute groups

You can optimize attribute order to improve processing speed, reduce dataset checks and resource usage.

In a large enterprise, the Dataset Attributes Database can be tracking millions of datasets.

When you craft a dataset attribute group, define the properties and attributes so that they limit the number of datasets that make up the group. If a large number of datasets in the Dataset Attribute Database match the criteria for your group, or a large number of datasets need to be examined to determine whether they match the criteria, this activity can require excessive processing, network, and space resources.

You can modify the order of the attributes to reduce cost of data collection for dataset attribute group.

By default, the first property (or the first attribute) is used as primary index for initial filtering. All other attributes are checked against the resulting datasets after the initial filtering.

For example, you have 10000 datasets. You specify two attributes: the first attribute filters 9000 datasets and the second attribute filters only 100. The second filter will be applied against the 9000 datasets from the first filter.

However, if you change the order of these two filters, you will have only 100 checks after the first filter, optimizing for faster results and less CPU consumption.

After any dataset attribute collection with any type, all groups are updated. Large amount of dataset attribute groups could cause high CPU consumption, and every incremental collection might run every 15, 30 or 60 minutes.