Use cases

While it is technically feasible to push any custom code in the database, it might not be optimal all the time. Netezza is known for its massively parallel processing capabilities. Data is usually distributed on different data slices and SQL operations are executed in parallel on each of these data slices (on worker nodes - SPUs) and aggregation is finally done on the master node (host). So, it is ideal when the use-case is targeted to leverage Netezza’s parallelism.

1. Applying a custom data transformation on each and every record of the database table.: This does not require data aggregation in one place before you apply your routine. It can be run in parallel on different data slices. This is an optimal scenario. See NZFunApply.
2. Building an ML model against all the available rows.: Because of the nature of the Netezza host and worker nodes architecture, as long as the model is to be built for each available slice of the data separately, it is an optimal scenario as multiple worker nodes could work on it in parallel. However, if the model is to be built for the entire table data, the data needs to be aggregated in one place before further processing. That limits Netezza’s scope to run things in parallel. This is not optimal scenario. See NZFunTApply.
3. Building ML models for each partition.: Data can be partitioned and your goal is to build ML models for each partition. Each partition is an independent dataset. This does not require data aggregation in one place before you apply your model building code. You can exploit Netezza parallelism by building models (for each of these partitions) in parallel. This is an optimal scenario. See NZFunGroupedApply.
4. Exploring the data.: To explore data, gather statistics on the database, the data must be aggregated in one place before you apply your routine. This limits Netezza’s scope to run things in parallel. This is not an optimal scenario. However, given that this is a common task, an in-database SQL implementations is provided for most data exploration operations with Pandas dataframe abstraction. You can readily benefit from those. See Exploring data with built in SQL translations.