Data processing

The purpose of a module is to extract some information from its input, and generate some contextual information and metrics as output.

Limitations

As this is a sample module, some protocols may have only limited implementation. For example, for LDAP, only search type requests are extracted. You can expand the information that can be decoded for a protocol using the Generic TCP Module.

Event-driven processing

Processing is triggered by a change in the input (for example, additional payload), which leads to an invocation of the module’s process function.

Note: The Web Response Time Module API is designed to support chaining of modules. This feature is not currently implemented, however it is referred to in this guide. In the current implementation, each user module is provided with data from a TCP or IPv4 segment reassembler, which delivers data to modules based on request/reply state changes. Data from the user modules is not processed by any further modules; the data is sent directly to the Web Response Time agent for filtering, aggregation, and reporting.

Each invocation of the module’s process function is provided with three parameters: the wrt_module_instance_t initialized by init, a wrt_api_session_t handle, and a wrt_api_data_t handle. The session handle is common for each call to process for the same network session (for example, TCP session); this handle can be used to maintain state between calls to process. The data handle provides access to the currently available request/reply data, context, and metrics.

TCP-based protocols are usually stateful, which means some state must be stored between calls to the module’s process function to decode them. Even non-stateful protocols may require some state passing, as the processing may be provided with partial data that must be either processed immediately, or buffered by the module. Both scenarios are described in the following sections.

Storing state

As mentioned above, the wrt_api_session_t handle may be used to maintain state between calls to process. This can be done by using the wrt_module_api_t set_userdata, and get_userdata functions.

For example, to store some data in the session, use the following code:

void my_destructor(wrt_api_session_t session, void *data) {
    free(data);
}
...
void *userdata = malloc(sizeof(long)); /* any data that fits in void* */
wrt_api_status_t status = api->set_userdata(session, userdata, &my_destructor);

If the call to set_userdata succeeds (that is, it returns zero), retrieve the value later with get_userdata. When the session terminates, the destructor (if specified) will be invoked with the session and the userdata as arguments.

To retrieve the userdata, call the API as follows:

void *userdata;
wrt_api_status_t status = api->get_userdata(session, &userdata);

If no data was previously set, get_userdata returns WRT_API_STATUS_NODATA; otherwise it copies the value into the provided pointer, and then returns WRT_API_STATUS_OK (zero). For a new session, userdata is always unset. A common pattern for initializing state for session decoding is to first call get_userdata, check if WRT_API_STATUS_NODATA was returned, and if so create a new state object and call set_userdata.

struct my_session_state *state = NULL;
wrt_api_status_t status = api->get_userdata(session, (void**)&state);
if (status == WRT_API_STATUS_NODATA)
{
    state = malloc(sizeof(struct my_session_state));
    /* init state */
    status = api->set_userdata(session, state, &destroy_state);
    if (status != WRT_API_STATUS_OK)
    {
        /* catastrophic failure: could not set state. */
    }
}

Buffering data

In order to minimize resource requirements, the module container does not retain payload data after it has provided it to a module. If a module is presented with partial data, and the module cannot process the data until it is received in its entirety, the module must perform its own buffering.

To buffer data, use the session state and userdata mechanism described above. For example, you could store a state structure which contains the amount of data buffered so far, and a pointer to a heap-allocated copy of the data. The API flow in process is similar to the following:

Obtain or initialise the session state, using the pattern described above
Obtain the request/reply payload data, accumulating it into any previously buffered payload data.
Process the currently buffered data, and retain only the unprocessed data.

Contextual information and metrics

When a module processes some data, it may choose to send it along to the next module in the processing chain, typically with some additional information that it has extracted from the input. The data that is sent is transferred through a wrt_api_data_t handle.

A wrt_api_data_t has associated context information (e.g. the source and destination IP addresses, source and destination TCP ports), and some metrics (e.g. the request/reply response time, request timestamp, reply timestamp). Each module in a processing chain may add to or modify the values in the set, but never remove information. Thus, all input context and metrics are implicitly output; only their values may be modified, and additional context and metrics may be added.

To set context and metrics, a module requires a unique numeric ID for each context and metric item as described in “Module initialization”. These IDs are provided to the module via a wrt_module_config_t structure, and the module supplies them to calls to the get/set_metric and get/set_context API functions.

Context example

A call to get or set context is shown below. In Module initialization a context ID was extracted for the context item baz.

wrt_context_id_t baz_id; /* Assigned in foo_init_function. */
wrt_context_type_t ctx_type;
const void *ctx_value;
size_t ctx_size;
wrt_api_status_t status;

status = api->get_context(data, baz_id, &type, &ctx_value, &ctx_size);
/* Do something with the value, then update it. */
status = api->set_context(data, baz_id, type, ctx_value, ctx_size);

There are various in-built context items, depending on the underlying protocol. The numeric IDs for these context items can be obtained in the same way as described previously. For request/reply TCP or IPv4, the keys of the context items are:

Table 1. Syntax descriptions
Context key	Description	Type
tcp.srcport	Source TCP port	uint16
tcp.dstport	Destination TCP port	uint16
ipv4.srcaddr	Source IPv4 address	ipv4
ipv4.dstaddr	Destination IPv4 address	ipv4
ipv4.origsrcaddr	Original source IPv4 address	ipv4
ipv4.origdstaddr	Original destination IPv4 address	ipv4

Note: The value of ipv4.srcaddr may be updated to represent a source address other than the actual address, for example, for HTTP, report X-Forwarded-For. The value of ipv4.origsrcaddr should always be the actual source IPv4 address (for example, of the proxy server).

Metrics example

Metrics are handled similarly to context. See below for an example of using get_metric and set_metric:

wrt_metric_id_t server_time_id; /* Assigned in foo_init_function. */
wrt_metric_type_t type;
wrt_metric_value_t value;
wrt_api_status_t status;

status = api->get_metric(data, server_time_id, &type, &value);
/* wrt_metric_value_t is a union of basic integer types.
 * If you don't know the type of the metric ahead of time,
 * check the "type" variable updated by get_metric, and switch
 * on the result. For brevity, we assume a specific type here. */

/* The Server Time metric is an unsigned 64-bit integer. */
value.u64 += 42;
status = api->set_metric(data, server_time_id, type, value);

As with context, there are various in-built metrics, depending on the underlying protocol. For request/reply TCP or IPv4, the metrics are:

tcp.response_time.total
tcp.response_time.server
tcp.response_time.network
tcp.response_time.load
tcp.response_time.resolve
tcp.response_time.client_render

These metrics all have a type of uint64.

See the Enhanced network timing calculations for Web Response Time metrics in the Administrator's Guide for definitions of these metrics.

Trace logging

To enable you to debug a module, the API provides two functions for logging: init_log and log_message.

init_log is an optional function for registering a logging handle with a specified filename. The filename is used only for identifying log messages. A typical call to init_log looks like:

wrt_api_log_handle_t log_handle;
wrt_api_status_t status = api->init_log(__FILE__, &log_handle);

log_message formats and logs a message to the module container’s log, optionally specifying a log handle initialized with init_log. The format is the same as in the C89/C99 vprintf function. Calls to log_message specify a log level, which is interpreted by the container to determine whether or not to log the message. A typical call to log_message looks like:

api->log_message(log_handle, WRT_API_LOG_ERROR,
                 __func__, __LINE__,
                 "send_data failed with status code %d",
                 (int)status);

The log handle parameter may optionally be NULL, in which case the log message is associated with a filename of the module container’s choosing, instead of a filename specified by the module.