Making kernel extensions DLPAR-aware

Like applications, most kernel extensions are DLPAR-safe by default.

However, some are sensitive to the system configuration and might need to be registered with the DLPAR subsystem. Some kernel extensions partition their data along processor lines, create threads based on the number of online processors, or provide large pinned memory buffer pools. These kernel extensions must be notified when the system topology changes. The mechanism and the actions that need to be taken parallel those of DLPAR-aware applications.

Registering reconfiguration handlers

The following kernel services are provided to register and unregister reconfiguration handlers:
#include sys/dr.h

int reconfig_register(int (*handler)(void *, void *, int, dr_info_t *),
                      int actions, void * h_arg, ulong *h_token, char *name);

void reconfig_unregister(ulong h_token);
int (*handler)(void *event, void *h_arg, unsigned long long req, void *resource_info);

void reconfig_unregister(ulong h_token);
int reconfig_register_ext (int (*handler)(void *, void *, unsigned long long, dr_info_t *),
unsigned long long actions, void * h_arg, ulong *h_token, char *name);

int (*handler)(void *event, void *h_arg, unsigned long long req, void *resource_info);
kerrno_t reconfig_register_list(int (*handler)(void *, void *, dr_kevent_t, void *), 
dr_kevent_t event_list[], size_t list_size, void *h_arg, ulong *h_token, char *name);

int (*handler)(void *event, void *h_arg, dr_kevent_t event_in_prog, void *resource_info);
Note: You are encouraged to use the kernel service reconfig_register_list. This service supports more events to be notified to the kernel extensions. The previous kernel services (reconfig_register and reconfig_register_ext) are limited to 32 and 64 events, respectively, making the kernel extensions using this service not portable to future systems supporting more than 32 and 64 events.
The parameters for the reconfig_register reconfig_register_ext, and reconfig_register_list subroutines are as follows:
  • The handler parameter is the kernel extension function to be invoked.
  • The actions parameter allows the kernel extension to specify which events require notification. For a list of the events, see the reconfig_register, reconfig_register_ext, and reconfig_unregister kernel services.
  • The h_arg parameter is specified by the kernel extension, remembered by the kernel along with the function descriptor for the handler, and then passed to the handler when it is invoked. It is not used directly by the kernel, but is intended to support kernel extensions that manage multiple adapter instances. In practice, this parameter points to an adapter control block.
  • The h_token parameter is an output parameter and is intended to be used when the handler is unregistered.
  • The name parameter is provided for information purposes and can be included within an error log entry if the driver returns an error. It is provided by the kernel extension and should be limited to 15 ASCII characters.
  • The event_list parameter is an array of dr_kevent_t values for which the kernel extension should be notified of when they occur. For a list of defined events, see the reconfig_register_list kernel service.
  • The list_size parameter is the size of the memory consumed by the event_list parameter.

The reconfig_register and reconfig_register_ext functions return 0 for success, and the appropriate errno value otherwise.

The reconfig_unregister function is called to remove a previously installed handler.

The reconfig_register, reconfig_register_ext, and reconfig_unregister functions can only be called in the process environment.

If a kernel extension registers for the pre-phase, it is advisable that it register for the check phase to avoid partial unconfiguration of the system when removing resources.

Reconfiguration Handlers

The interface to the reconfiguration handler used with the reconfig_register_list kernel service is as follows:
Int (*handler)(void *event, void *h_arg, dr_kevent_t event_in_prog, void *resource_info);
The parameters to the reconfiguration handler are as follows:
  • The event parameter is passed to the handler and is intended to be used only when calling the reconfig_handler_complete subroutine.
  • The h_arg parameter is specified at registration time by the handler.
  • The event_in_prog parameter indicates the DLPAR operation performed by the handler. For a list of the events, see the reconfig_register_list kernel service.
  • The resource_info parameter identifies the resource-specific information for the current DLPAR request. If the request is processor-based, then the resource_info data is provided through a dri_cpu structure. If the request is memory-based, a dri_mem structure is used. On a Micro-Partitioning® partition, if the request is processor-capacity based, the resource_info data is provided through a dri_cpu_capacity structure. For more information, and for the format of the dri_cpu_capacity structure, refer to reconfig Kernel Service.
struct dri_cpu {
        cpu_t           lcpu;           /* Logical CPU Id of target CPU */
        cpu_t           bcpu;           /* Bind Id of target CPU        */
};

struct dri_mem {
        size64_t        req_memsz_change;   /* user requested mem size  */
        size64_t        sys_memsz;          /* system mem size at start */
        size64_t        act_memsz_change;   /* mem added/removed so far */
        rpn64_t         sys_free_frames;    /* Number of free frames */
        rpn64_t         sys_pinnable_frames;/* Number of pinnable frames */
        rpn64_t         sys_total_frames;   /* Total number of frames */
        unsigned long long lmb_addr;        /* start addr of logical memory block */
        size64_t        lmb_size;           /* Size of logical memory block being added */
};

If the current DLPAR request is migration of a partition, the handler provides resource_info data to the resource_info data kernel extensions, but the kernel extensions do not need to access the contents of the resource_info data because this data is not used by the kernel extensions.

Reconfiguration handlers are invoked in the process environment.

Kernel extensions can assume the following:
  • Only a single type of resource is being configured or removed at a time.
  • Multiple processors will not be specified at the same time. However, kernel extensions should be coded to support the addition or removal of multiple logical memory blocks. You can initiate a request to add or remove gigabytes of memory.

The check phase provides the ability for DLPAR-aware applications and kernel extensions to react to the user's request before it has been applied. Therefore, the check-phase kernel extension handler is invoked once, even though the request might devolve to multiple logical memory blocks. Unlike the check phase, the pre-phase, post phase, and post-error phase are applied at the logical memory block level. This is different for application notification, where the pre-phase, post phase, or alternatively the post-error phase are invoked once for each user request, regardless of the number of underlying logical memory blocks. Another difference is that the post-error phase for kernel extensions is used when a specific logical memory block operation fails, whereas the post-error phase for applications is used when the operation, which in this case is the entire user request, fails.

In general, during the check phase, the kernel extension examines its state to determine whether it can comply with the impending DLPAR request. If this operation cannot be managed, or if it would adversely effect the proper execution of the extension, then the handler returns DR_FAIL. Otherwise the handler returns DR_SUCCESS.

During the pre-remove phase, kernel extensions attempt to remove any dependencies that they might have on the designated resource. An example is a driver that maintains per-processor buffer pools. The driver might mark the associated buffer pool as pending delete, so that new requests are not allocated from it. In time, the pool is drained and might be freed. Other items that must be considered in the pre-remove phase are timers and bound threads, which need to be stopped and terminated, respectively. Alternatively, bound threads can be unbound.

During the post-remove phase, kernel extensions attempt to free any resources through garbage collection, assuming that the resource was actually removed. If it was not, timers and threads must be reestablished. The DR_resource_POST_ERROR request is used to signify that an error occurred.

During the pre-add phase, kernel extensions must pre-initialize any data paths that are dependent on the new resource, so that when the new resource is configured, it is ready to be used. The system does not guarantee that the resource will not be used prior to the handler being called again in the post phase.

During the post-add phase, kernel extensions can assume that the resource has been properly added and can be used. This phase is a convenient place to start bound threads, schedule timers, and increase the size of buffers.

Kernel extensions can also be notified of memory removals or additions on a per-operation basis, much like applications, by registering for one or more of the _OP_ notification types. This enables a kernel extension to modify its resource usage in response to a memory DR operation only once per operation, rather than once per logical memory block (LMB).

DR_MEM_REMOVE_OP_PRE notification is sent before a memory remove. Reconfiguration handlers can start adjusting their resources in anticipation of the memory remove at this time. DR_MEM_REMOVE_OP_POST and DR_MEM_ADD_OP_POST notifications are sent after a memory remove or add operation, respectively, whether the operation failed or not. If the operation failed, act_memsz_change is 0.

If possible, within a few seconds, the reconfiguration handlers return DR_SUCCESS to indicate successful reconfiguration, or DR_FAIL to indicate failure. If more time is required, the handler returns DR_WAIT.

Extended DR handlers

If a kernel extension expects that the operation is likely to take a long time, that is, several seconds, the handler returns DR_WAIT to the caller, but proceed with the request asynchronously. In the following case, the handler indicates that it has completed the request by invoking the reconfig_handler_complete routine.
void reconfig_handler_complete(void *event, int rc);

The event parameter is the same parameter that was passed to the handler when it was invoked by the kernel. The rc parameter must be set to either DR_SUCCESS or DR_FAIL to indicate the completion status of the handler.

The reconfig_handler_complete kernel service can be invoked in the process or interrupt environments.

Using the xmemdma kernel service

On systems that are capable of DLPAR, such as the dynamic removal of memory, calls to the xmemdma kernel service without the XMEM_DR_SAFE flag result in the specified memory being flagged as not removable. This is done to guarantee the integrity of the system, because the system has no knowledge of how the caller intends to use the real memory address that was returned. Dynamic memory removal operations are still possible for other memory, but not for the memory that the xmemdma call specifies.

If the caller is using the real memory address only for informational purposes, such as for trace buffers or debug information, then the caller can set the XMEM_DR_SAFE flag. This is an indication to the system that the real memory address can be exposed to the caller without any risk of data corruption. When this flag is present, the system will still permit the specified memory to be dynamically removed.

If the caller is using the real memory address to perform actual data access, either by turning off data translation and performing CPU load or store access to the real memory, or by programming direct memory access (DMA) controllers to target the real memory, the XMEM_DR_SAFE flag must not be set. If the flag is set, the system's data integrity could be compromised when the memory is dynamically removed. For information on converting a kernel extension that uses real memory addresses in this way to be DLPAR-aware, contact your IBM® Service Representative.

For more information, see the xmemdma kernel service.

Controlling memory DLPAR notification for applications

Dynamic addition or removal of memory from an LPAR running multiple DLPAR-aware programs can result in conflict for resources. By default, each program gets notified equally about the resource change. For example, if 1 GB of memory is removed from an LPAR running two DR-aware programs, then, by default, each program is notified that 1 GB of memory has been removed. Because the two programs are generally unaware of each other, both of them will scale down their memory use by 1 GB, leading to inefficiency. A similar efficiency problem can also occur when new memory is added.

To overcome this problem, AIX® allows application scripts to be installed with a percentage factor that indicates the percentage of the actual memory resource change. The system then notifies the application in the event of a memory DLPAR. While installing the application scripts using the drmgr command, you can specify this percentage factor using the DR_MEM_PERCENT name=value pair. The application script will need to output this name=value pair when it is invoked by the drmgr command with the scriptinfo subcommand. The value must be an integer between 1 and 100. Any value outside of this range is ignored, and the default value, which is 100, is used. Additionally, you can also set this name=value pair as an environment variable at the time of installation. During installation, the value from the environment variable, if set, will override the value provided by the application script.

Similarly, in applications using the SIGRECONFIG signal handler and dr_reconfig() system call, you can control the memory DLPAR notification by setting the DR_MEM_PERCENT name=value pair as an environment variable before the application begins running. This value, however, cannot be changed without restarting the application.