Random sampling concepts

In general, the sampling types random, row, and block are supported in IBM watsonx.data intelligence. Several conditions define how the sample is composed.

For connected data assets, it is checked whether the connector supports pushdown of sampling to the data source. If the sampling type is supported, sampling happens at the data source.

If the connector does not support any of these sampling types, 80% of the records in each batch of 10,000 read records are picked for the sample until the required sample size is reached.

For example, if you have a table with 10,000,000 records and a random sample of 50,000 records is needed, 80% of records are fetched from each batch of 10,000 records, which makes 8,000 records per batch in this case. So, to get a sample of 50,000 records, about 7 batches of 10,000 records are read.