Calculating the Memory Use of a SPUPad
When you incorporate SPUPads into your UDX applications, make sure that you include the memory use of the SPUPad into the MAXIMUM MEMORY setting for your UDX. This helps to ensure correct performance scheduling for the queries that use the UDX.
Based on the number of objects and size of the objects that you store in the SPUPad, you can obtain a rough estimate of its memory allocation. For each object, be sure to add the space that is consumed by the pointers that refer to it as well; so, add an int for each declared object in the SPUPad.
For SPUPads that run in S-Blade memory, make sure that you consider the number of dataslices that are managed by the S-Blade. Typically, an S-Blade in an IBM® Netezza® 100 or IBM Netezza 1000/IBM PureData® System for Analytics N1001 system manages 8 dataslices by default (sometimes 6, and sometimes more if one or more S-Blades have failed within the SPA). An IBM PureData System for Analytics N200x system has S-Blades that manage 40 dataslices by default. The system creates one SPUPad for each dataslice which contains rows that are being queried.
You can use the getTotalSize call in a UDX to determine the size of the SPUPad, and then use that information to update the MAXIMUM MEMORY value of the UDX. Memory calculation is often an iterative process; for example, as you debug your UDX in the test harness, you can use getTotalSize to calculate the memory and return the value by using logMsg(). You can also use the call within your UDFs as they run on a development system, and as the memory allocation becomes better known, you can use the ALTER FUNCTION or ALTER AGGREGATE commands to modify the MAXIMUM MEMORY setting appropriately.
It is important to set a realistic maximum memory for your user-defined functions, especially those with SPUPads. If the memory estimate is too low, the system could schedule the UDF to run at times when sufficient memory is not available, which would cause the UDF to fail. If you estimate the maximum memory higher than really is necessary, the UDF might not schedule the UDF to run until the memory is available, which could delay the UDF queries.