Sampler operator reference

The Sampler operator is a filter that can create a repeatable sample for any incoming virtual input table.

For very large input tables, building a mining model may take a long time to process. A possible solution is to create the mining model for a random sample of the input data. This is especially useful during the iterative development phase where you want to test different mining settings to find the best combination that creates a good model. Once you are satisfied with the mining setting, you may want to remove the Sampler operator and run the final mining run to build the model on the whole dataset.

Tip: You can specify a Sampling Rate on the table source operator properties view. Whenever possible, use the table source operator for sampling because this exploits Db2® built-in sampling with very good performance. However, to create a sample for intermediate results in your mining flow, use the Sampler operator, for example, a sample after joining two tables.


Feedback