DLPAR-safe and aware programs
A DLPAR-safe program is one that does not fail as a result of DLPAR operations.
Its performance might suffer when resources are removed and it might not scale with the addition of new resources, but the program still works as expected In fact, a DLPAR-safe program can prevent a DLPAR operation from succeeding because it has a dependency that the operating system is obligated to honor.
- By regularly polling the system in an attempt to discover changes in the system topology
- By registering application specific code that is notified when a change is occurring to the system topography
DLPAR-aware programs must be designed, at minimum, to avoid introducing conditions that might cause DLPAR operations to fail. At maximum, DLPAR-aware programs are concerned with performance and scalability. This is a much more complicated task because buffers might need to be drained and resized to maintain expected levels of performance when memory is added or removed. In addition, the number of threads must be dynamically adjusted to account for changes in the number of online processors. These thread-based adjustments are not necessarily limited to processor-based decisions. For example, the best way to reduce memory consumption in Java programs might be to reduce the number of threads, because this reduces the number of active objects that need to be processed by the Java™ Virtual Machine's garbage collector.
Most applications are DLPAR-safe by default.
Making programs DLPAR-safe
- If a program has code that is optimized for uniprocessor systems
and the number of processors in the partition is increased from one
to two, programs that make runtime checks might take an unexpected
path if a processor is added during one of these checks. Potential
problems can also occur in programs that implement their own locking
primitives, but do so using uniprocessor serialization techniques;
that is, the sync and isync instructions are not included. The use
of these instructions is also required for self-modifying and generated
code, and are thus necessary on DLPAR-enabled systems.
Be sure to look for uniprocessor-based logic. Programs that make uniprocessor
assertions must include logic that identifies the number of online
processors. Programs can determine the number of online processors by:
- Loading the _system_configuration.ncpus field
- var.v_ncpus
- Using the sysconf system call with the _SC_NPROCESSORS_ONLN flag.
- Programs that index data by processor number typically use the mycpu system
call to determine the identity of the currently executing processor,
in order to index into their data structures. The problem potentially
occurs when a new processor is added because the path to the data
might not be properly initialized or allocated. Programs that preallocate
processor-based lists using the number of online CPUs are broken,
because this value changes with DLPAR. Avoid this problem by preallocating processor-based data using the maximum possible number of processors that can be brought online at the same time. The operating system can be said to be configured to support a maximum of N processors, not that there are N processors active at any given time. The maximum number of processors is constant, while the number of online processors is incremented and decremented as processors are brought online and taken offline. When a partition is created, the minimum, desired, and maximum numbers of processors are specified. The maximum value is reflected in the following variables:
- _system_configuration.max_ncpus
- _system_configuration.original_ncpus
- var.v_ncpus_cfg
- sysconf (_SC_NPROCESSORS_CONF)
The _system_configuration.original_ncpus and var.v_ncpus_cfg variables are preexisting variables. On DLPAR-enabled systems they represent a potential maximum value. On systems not enabled for DLPAR, the value is dictated by the number of processors that are configured at boot time. Both represent the conceptual maximum value that can be supported, even though a processor might have been taken offline by Dynamic Processor Deallocation. The use of these preexisting fields is recommended for applications that are built on AIX® 4.3, because this facilitates the use of the same binary on AIX 4.3 and later. If the application requires runtime initialization of its processor-based data, it can register a DLPAR handler that is called before a processor is added.
Making programs DLPAR-aware
A DLPAR-aware program is one that is designed to recognize and dynamically adapt to changes in the system configuration. This code need not subscribe to the DLPAR model of awareness, but can be structured more generally in the form of a system resource monitor that regularly polls the system to discover changes in the system configuration. This approach can be used to achieve some limited performance-related goals, but because it is not tightly integrated with DLPAR, it cannot be effectively used to manage large-scale changes to the system configuration. For example, the polling model might not be suitable for a system that supports processor hot plug, because the hot-pluggable unit might be composed of several processor and memory boards. Nor can it be used to manage application-specific dependencies, such as processor bindings, that need to be resolved before the DLPAR processor remove event is started.
- Applications that are designed to scale with the system configuration,
including those:
- That detect the number of online processors or the size of physical memory when the application starts
- That are externally directed to scale based on an assumed configuration of processors and memory, which usually translates into the use of a maximum number of threads, maximum buffer sizes, or a maximum amount of pinned memory
- Applications that are aware of the number of online processors
and the total quantity of system memory, including the following types
of applications:
- Performance monitors
- Debugging tools
- System crash tools
- Workload managers
- License managers Note: Not all license managers are candidates for DLPAR, especially user-based license managers.
- Applications that pin their application data, text, or stack using the plock system call
- Applications that use System V Shared Memory Segments with the PinvOption (SHM_PIN)
- Applications that bind to processors using the bindprocessor system call
Dynamic logical partitioning of large memory pages is not supported. The amount of memory that is preallocated to the large page pool can have a material impact on the DLPAR capabilities of the partition regarding memory. A memory region that contains a large page cannot be removed. Therefore, application developers might want to provide an option to not use large pages.
Making programs DLPAR-aware using DLPAR APIs
Application interfaces are provided to make programs DLPAR-aware. The SIGRECONFIG signal is sent to the applications at the various phases of dynamic logical partitioning. The DLPAR subsystem defines check, pre and post phases for a typical operation. Applications can watch for this signal and use the DLPAR-supported system calls to learn more about the operation in progress and to take any necessary actions.
The issue of timely signal delivery can be managed by the application by controlling the signal mask and scheduling priority. The DLPAR-aware code can be directly incorporated into the algorithm. Also, the signal handler can be cascaded across multiple shared libraries so that notification can be incorporated in a more modular way.
- Catch the SIGRECONFIG signal by using the sigaction system call. The default action is to ignore the signal.
- Control the signal mask in at least one of the threads so that the signal can be delivered in real time.
- Ensure that the scheduling priority for the thread that is to receive the signal is sufficient so that it will run quickly after the signal has been sent.
- Run the dr_reconfig system call to obtain the type of resource,
type of action, phase of the event, as well as other information that
is relevant to the current request. Note: The dr_reconfig system call is used inside the signal handler to determine the nature of the DLPAR request.