Remote AE Processing

The remote AE listening on an AE connection point receives notifications from the Netezza system that a new instance of an SQL function is being called. The remote AE then creates a new thread, obtains a thread from a pool, or forks a new process to handle the SQL function call. The thread or process handling the SQL function creates a data connection for that function. When the request is completed successfully, it is important that the remote AE calls "done" on the data connection and closes it. If an error occurs, it is important that the AE calls "error" on the data connection and then closes it. The same thread or process can be used to serially process SQL function calls so it is possible to use a thread pool or a process pool.

Below are four possible scenarios that can be encountered when implementing AEs.

Scenario 1: Language Supports Threads

Using a language that supports threads, possibly Java or C++ with POSIX threads, you run one instance of the AE daemon-style on each SPU and one on the host. Each SQL function is handled on a thread taken from a pool, so that requests are handled concurrently. In this case, the AE connection point address consists only of the string name and nothing else. The remote AE has one listener on this address.

Scenario 2: Language Does Not Support Threads

Using a language that does not support threads, does not support them well, or where threads are undesirable, use the same connection point as Scenario 1. To process SQL functions, you must fork and handle the SQL function in the new process. In this case, the listening process listens for AE environments, which are sets of AE environment variables that describe an AE SQL function request. After forking, the AE environment can be used to create an AE data connection.

Scenario 3: Use of a Single Instance

The solutions for the first two scenarios dictate that when you use the database you use the same single instance of the remote AE per SPU or host. If for security or other reasons you do not want others sharing AEs, you can add session ID to the remote AE address.

To do so, set NZAE_REMOTE_NAME_SESSION=1 (or the register_ae option --rsession) when registering the SQL function, and specify the session ID when you create the connection point in the remote AE.

Note that you must still use threads or fork new processes because multiple simultaneous SQL function call notifications are still received. This occurs because a SPU consists of multiple dataslices, and simultaneous function calls come from each dataslice.

You must launch the remote AE after the session starts and stop the remote AE before it ends. (Launching and stopping remote AEs is covered in Launching a Remote Analytic Executable.) Each session uses its own instance of the remote AE process.

Scenario 4: Data slice in the Remote AE Address

There are two possible approaches that use dataslices in the remote AE address. The first is to launch one AE per dataslice on each SPU. The SQL function is registered with NZAE_REMOTE_NAME_DATA_SLICE=1 using (or the register_ae option --rdataslice). Each remote AE can use an API call to return the dataslice for which it was launched. It then listens on a remote address consisting of string name and dataslice ID. If there are four dataslices on a SPU, then there are four remote AE processes. Each remote AE inputs data only for its dataslice.

This approach works well for a programming language like C++, however it may not work as well if each remote AE process has very large resource requirements or is written in a programming language with a resource-intensive runtime environment. In this situation, there is one heavy AE process per dataslice; collectively the multiple processes may be consuming too many system resources, especially memory.

The other approach is a more complex way to use dataslices in the remote AE address. A single remote AE process can use multiple notification connections simultaneously, as long as they have different addresses. In Scenario 1, one notification listener was used for the entire process with a remote AE address string name. For that scenario, consider the case where you run a single query that runs on the SPUs. A remote AE running on a SPU gets multiple SQL function call notifications, one from each dataslice, on the SPU. The notifications are simultaneous but are processed serially by the listener. In Scenario 1, each notification was received and assigned to a thread in the pool to service the SQL function. The data connections are processed simultaneously but some start before others because the notifications are processed individually.

For efficiency, you want one notification listener per dataslice. If there are four dataslices on a SPU, you create four threads, each listening to an AE connection point specified with an address built from remote string name and dataslice ID.

The SQL function is still registered with NZAE_REMOTE_NAME_DATA_SLICE=1 using the register_ae option --rdataslice.

Assume the remote name is "MY_REMOTEAE" and the dataslices on a given SPU are 17, 18, 19, and 20. In the AE process, the four notification threads are listening on AE connection point addresses:

{ name=MY_REMOTEAE, data slice id=17 }
{ name=MY_REMOTEAE, data slice id=18 }
{ name=MY_REMOTEAE, data slice id=19 }
{ name=MY_REMOTEAE, data slice id=20 }

When a listening thread receives a notification of a new SQL function call, it retrieves a notification thread from the pool and assigns it to that SQL function request for the assigned dataslice.

Finally, you want a listener to handle AE job control commands. Job control allows ping and stop commands to be sent to a running remote AE. The connection point is:

{ name=MY_REMOTEAE }

There is no direct API call to determine how the single AE process running on the SPU determines which dataslices to listen for. There is a database view called _v_dual_dslice that contains one record for each dataslice for the entire database. For example:

SELECT * FROM _v_dual_dslice;
DSID 
------
1
4
2
3
…

To use this view, register a scalar function that connects to the remote AE using the address string name. The scalar takes one integer argument and is called using column DSID in view _v_dual_dslice as an argument. After the remote AE is launched, this scalar function is called. The remote AE on each SPU receives one notification for each dataslice. When the remote AE opens the data connection, it obtains the dataslice for that request and uses that slice to start the listener for address string name, dataslice on a new thread. The scalar function is processed normally to return a value and close the data connection. Since the scalar function is being used to provide the remote AE information, the return value is irrelevant.

To summarize, in Scenario 1, you used one listener. In Scenario 4, approach one, you used one remote AE process per dataslice, each with one listener. For Scenario 4, approach two, you used five different notification listeners in a single remote AE process.