Understanding row buffering

AE row buffering can increase performance. Since AEs run in a separate OS process, a significant part of performance overhead may be the cost of marshaling row data between the NPS process and an AE process.

This performance cost is affected by the raw amount of data transferred and by thenumber of data round trips between the NPS system and an AE. The performance cost occurs because a large number of OS process context switches can be expensive.

One way to lower this cost within an application is to put more information in each row, creating larger row sizes. This approach can be useful for AEs called from any type of SQL function. Keep in mind, however, that the Netezza system row size is limited to 64K bytes.

Another way to improve performance is to use AE row buffering. This functionality can only be used for AEs invoked through SQL table functions. AE row buffering is most useful when inputting or outputting a large number of small rows. When activated, the AE automatically places rows into a buffer and only transmits the buffer between the NPS system and AE when it becomes full or the end of data is reached. AE row buffering can be set for both input and output rows (full buffering) or for output rows only (result only B-buffering).

The size of the buffer is an internal implementation detail of the Netezza system. It is large enough to be useful but not so large that a single function overuses memory resources. The implication is that row buffering increases performance with smaller records (of roughly less than 1000 bytes) and decreases performance with larger records.

Note that when you use full buffering, the SQL output connection between the input and output rows is lost. For example, a table function implemented using an AE that returns exactly one output row per input row is called in this manner:

SELECT t.col1, f.result from mytable t, table WITH
final(mytablefunction(t.col1)) f;

When not using AE row buffering, the output f.result always corresponds to the input t.col1 that produced it. The use of AE row buffering means that this correspondence is lost in the output. There are many application scenarios where this correspondence is not required., however if it is required, a possible workaround is to have the table function return both the result and the input that produced it so that the values correspond.

AE row buffering is controlled by the AE environment variable NZAE_ROW_BUFFER. For more information, see Row buffering.