Actions taken by DLPAR scripts

Application scripts are started for both add and remove operations.

When removing resources, scripts are provided to resolve conditions imposed by the application that prevent the resource from being removed. The presence of particular processor bindings and the lack of pinnable memory might cause a remove request to fail. A set of commands is provided to identify these situations, so that scripts can be written to resolve them.

To identify and resolve the DLPAR dependencies, the following commands can be used:
  • The ps command displays bindprocessor attachments and plock system call status at the process level.
  • The bindprocessor command displays online processors and makes new attachments.
  • The kill command sends signals to processes.
  • The ipcs command displays pinned shared-memory segments at the process level.
  • The lsrset command displays processor sets.
  • The lsclass command displays Workload Manager classes, which might include processor sets.
  • The chclass command is used to change class definitions.

Scripts can also be used for scalability and general performance issues. When resources are removed, you can reduce the number of threads that are used or the size of application buffers. When the resources are added, you can increase these parameters. You can provide commands that can be used to dynamically make these adjustments, which can be triggered by these scripts. Install the scripts to start these commands within the context of the DLPAR operation.

High-level structure of DLPAR scripts

This section provides an overview of the scripts, which can be Perl scripts, shell scripts, or commands. Application scripts are required to provide the following commands:
  • scriptinfo

    Identifies the version, date, and vendor of the script. It is called when the script is installed.

  • register

    Identifies the resources managed by the script. If the script returns the resource name cpu, mem, capacity, or var_weight, the script is automatically started when DLPAR attempts to reconfigure processors, memory, entitled capacity, or variable weight. The register command is called when the script is installed with the DLPAR subsystem.

  • usage resource_name

    Returns information describing how the resource is being used by the application. The description should be relevant so that the user can determine whether to install or uninstall the script. It should identify the software capabilities of the application that are impacted. The usage command is called for each resource that was identified by the register command.

  • checkrelease resource_name

    Indicates whether the DLPAR subsystem should continue with the removal of the named resource. A script might indicate that the resource should not be removed if the application is not DLPAR-aware and the application is considered critical to the operation of the system.

  • prerelease resource_name

    Reconfigures, suspends, or terminates the application so that its hold on the named resource is released.

  • postrelease resource_name

    Resumes or restarts the application.

  • undoprerelease resource_name

    Started if an error is encountered and the resource is not released.

  • checkacquire resource_name

    Indicates whether the DLPAR subsystem should proceed with the resource addition. It might be used by a license manager to prevent the addition of a new resource, for example cpu, until the resource is licensed.

  • preacquire resource_name

    Used to prepare for a resource addition.

  • undopreacquire resource_name

    Starts if an error is encountered in the preacquire phase or when the event is acted on.

  • postacquire resource_name

    Resumes or starts the application.

  • preaccevent resource_name

    Used to prepare a DLPAR update.

  • postaccevent resource_name

    Resumes or starts the application.

  • undopreaccevent resource_name

    Started if an error is encountered in the preaccevent phase or when the event is acted upon.

  • pretopolgyupdate resource_name

    Used to prepare for a system topology update.

  • postopolgyupdate resource_name

    Resumes or starts the application.

Installing application scripts using the drmgr command

The drmgr command maintains an internal database of installed-script information. This information is collected when the system is booted and is refreshed when new scripts are installed or uninstalled. The information is derived from the scriptinfo, register, and usage commands. The installation of scripts is supported through in the drmgr command, which copies the named script to the script repository where it can be later accessed. The default location for this repository is /usr/lib/dr/scripts/all. Within workload partitions, the default script repository location is /var/dr/scripts. You can specify an alternate location for this repository. To determine the machine upon which a script is used, specify the target host name when installing the script.

To specify the location of the base repository, use the following command:
drmgr -R base_directory_path
To install a script, use the following command:
drmgr -i script_name [-f] [-w mins] [-D hostname]
The following flags are defined:
  • The -i flag is used to name the script.
  • The -f flag must be used to replace a registered script.
  • The -w flag is used to specify the number of minutes that the script is expected to execute. This is provided as an override option to the value specified by the vendor.
  • The -D flag is used to register a script to be used on a particular host.
To uninstall a script, use the following command:
drmgr -u script_name [-D hostname]
The following flags are defined:
  • The -u flag is used to indicate which script should be uninstalled.
  • The -D flag is used to uninstall a script that has been registered for a specific directory.
To display information about scripts that have already been installed, use the following command:
drmgr -l

Naming conventions for scripts

It is suggested that the script names be built from the vendor name and the subsystem that is being controlled. System administrators should name their scripts with the sysadmin prefix. For example, a system administrator who wanted to provide a script to control Workload Manager assignments might name the script sysadmin_wlm.

Script execution environment and input parameters

Scripts are started with the following execution environment:
  • Process UID is set to the UID of the script.
  • Process GID is set to the GID of the script.
  • PATH environment variable is set to the /usr/bin:/etc:/usr/sbin directory.
  • LANG environment variable might or might not be set.
  • Current working directory is set to /tmp.
  • Command arguments and environment variables are used to describe the DLPAR event.

Scripts receive input parameters through command arguments and environment variables, and provide output by writing name=value pairs to standard output, where name=value pairs are delimited by new lines. The name is defined to be the name of the return data item that is expected, and value is the value associated with the data item. Text strings must be enclosed by parentheses; for example, DR_ERROR="text". All environment variables and name=value pairs must begin with DR_, which is reserved for communicating with application scripts.

The scripts use DR_ERROR name=value environment variable pair to provide error descriptions.

You can examine the command arguments to the script to determine the phase of the DLPAR operation, the type of action, and the type of resource that is the subject of the pending DLPAR request. For example, if the script command arguments are checkrelease mem, then the phase is check, the action is remove, and the type of resource is memory. The specific resource that is involved can be identified by examining environment variables.

The following environment variables are set for memory add and remove:
Note: In the following description, one frame is equal to 4 KB.
  • DR_FREE_FRAMES=0xFFFFFFFF

    The number of free frames currently in the system, in hexadecimal format.

  • DR_MEM_SIZE_COMPLETED=n

    The number of megabytes that were successfully added or removed, in decimal format.

  • DR_MEM_SIZE_REQUEST=n

    The size of the memory request in megabytes, in decimal format.

  • DR_PINNABLE_FRAMES=0xFFFFFFFF

    The total number of pinnable frames currently in the system, in hexadecimal format. This parameter provides valuable information when removing memory in that it can be used to determine when the system is approaching the limit of pinnable memory, which is the primary cause of failure for memory remove requests.

  • DR_TOTAL_FRAMES=0xFFFFFFFF

    The total number of frames currently in the system, in hexadecimal format.

The following environment variables are set for processor add and remove:
  • DR_BCPUID=N

    The bind CPU ID of the processor that is being added or removed in decimal format. A bindprocessor attachment to this processor does not necessarily mean that the attachment has to be undone. This is only true if it is the Nth processor in the system, because the Nth processor position is the one that is always removed in a Central Processing Unit (CPU) remove operation. Bind IDs are consecutive in nature, ranging from 0 to N and are intended to identify only online processors. Use the bindprocessor command to determine the number of online CPUs.

  • DR_LCPUID=N

    The logical Central Processor Unit (CPU) ID of the processor that is being added or removed in decimal format.

The following environment variables are set for Micro-Partitioning®.
DR_CPU_CAPACITY=N
The partition's percentage of shared physical processors.
DR_VAR_WEIGHT=N
The partition's relative priority for determining how to allocate shared pool idle cycles.
DR_CPU_CAPACITY_DELTA=N
The difference between the current value of the partition's percentage of shared physical processors and the value to which it will be changed when this operation is complete.
DR_VAR_WEIGHT_DELTA=N
The difference between the current value of the partition's variable weight and the value to which it will be changed when this operation is complete.

The operator can display the information about the current DLPAR request using the detail level at the HMC to observe events as they occur. This parameter is specified to the script using the DR_DETAIL_LEVEL=N environment variable, where N can range from 0 to 5. The default value is zero (0) and signifies no information. A value of one (1) is reserved for the operating system and is used to present the high-level flow. The remaining levels (2-5) can be used by the scripts to provide information with the assumption that larger numbers provide greater detail.

Scripts provide detailed data by writing the following name=value pairs to standard output:
name=value pair Description
DR_LOG_ERR=message Logs the message with the syslog level of the LOG_ERR environment variable.
DR_LOG_WARNING=message Logs the message with the syslog level of the LOG_WARNING environment variable.
DR_LOG_INFO=message Logs the message with the syslog level of the LOG_INFO environment variable.
DR_LOG_EMERG=message Logs the message with the syslog level of the LOG_EMERG environment variable.
DR_LOG_DEBUG=message Logs the message with the syslog level of the LOG_DEBUG environment variable.

In addition, the operator can also set up a log of information that is preserved by using the syslog facility, in which case, the above information is routed to that facility as well. You must configure the syslog facility in this case.

DLPAR script commands

This section describes the script commands for DLPAR:
scriptinfo
Provides information about the installed scripts, such as their creation date and resources.
register
started to collect the list of resources that are managed by the script. The drmgr command uses these lists to start scripts based on the type of resource that is being reconfigured.
usage
Provides human-readable strings describing the service provided by the named resource. The context of the message should help the user decide the implications on the application and the services that it provides when named resource is reconfigured. This command is started when the script is installed, and the information provided by this command is maintained in an internal database that is used by the drmgr command. Display the information using the -l list option of the drmgr command.
checkrelease
When removing resources, the drmgr command assesses the impacts of the removal of the resource. This includes execution of DLPAR scripts that implement the checkrelease command. Each DLPAR script in turn will be able to evaluate the peculiarities of its application and indicate to the drmgr command that is using the script's return code whether the resource removal will affect the associated application. If it finds that the removal of the resource can be done safely, an exit status of success is returned. If the application is in a state that the resource is critical to its execution and cannot be reconfigured without interrupting the execution of the application, then the script indicates that the resource should not be removed by returning an error. When the FORCE option is specified by the user, which applies to the entire DLPAR operation including its phases, the drmgr command skips the checkrelease command and begins with the prerelease commands.
prerelease
Before a resource is released, the DLPAR scripts are directed to assist in the release of the named resource by reducing or eliminating the use of the resource from the application. However, if the script detects that the resource cannot be released from the application, it should indicate that the resource will not be removed from the application by returning an error. This does not prevent the system from attempting to remove the resource in either the forced or non-forced mode of execution, and the script will be called in the post phase, regardless of the actions or inactions that were taken by the prerelease command. The actions taken by the operating system are safe. If a resource cannot be cleanly removed, the operation will fail.

The DLPAR script is expected to internally record the actions that were taken by the prerelease command, so that they can be restored in the post phase should an error occur. This can also be managed in post phase if rediscovery is implemented. The application might need to take severe measures if the force option is specified.

postrelease
After a resource is successfully released, the postrelease command for each installed DLPAR script is started. Each DLPAR script performs any post processing that is required during this step. Applications that were halted should be restarted.

The calling program will ignore any errors reported by the postrelease commands, and the operation will be considered a success, although an indication of any errors that may have occurred will also be reported to the user. The DR_ERROR environment variable message is provided for this purpose, so the message should identify the application that was not properly reconfigured.

undoprerelease
After a prerelease command is issued by the drmgr command to the DLPAR script, if the drmgr command fails to remove or release the resource, it will try to revert to the old state. As part of this process, the drmgr command will issue the undoprerelease command to the DLPAR script. The undoprerelease command will only be started if the script was previously called to release the resource in the current DLPAR request. In this case, the script should undo any actions that were taken by the prerelease command of the script. To this end, the script might need to document its actions, or otherwise provide the capability of rediscovering the state of the system and reconfiguring the application, so that in effect, the DLPAR event never occurred.
checkacquire
This command is the first DLPAR script-based command that is called in the acquire-new-resource sequence. It is called for each installed script that previously indicated that it supported the particular type of resource that is being added. One of the primary purposes of the checkacquire phase is to enable processor-based license managers, which might want to fail the addition of a processor. The checkacquire command is always started, regardless of the value of the FORCE environment variable, and the calling program honors the return code of the script. The user cannot force the addition of a new processor if a script or DLPAR-aware program fails the DLPAR operation in the check phase.

In short, the FORCE environment variable does not really apply to the checkacquire command, although it does apply to the other phases. In the preacquire phase, it dictates how far the script should go when reconfiguring the application. The force option can be used by the scripts to control the policy by which applications are stopped and restarted similar to when a resource is released, which is mostly a DLPAR-safe issue.

preacquire
Assuming that no errors were reported in the checkacquire phase, the system advances to the preacquire phase, where the same set of scripts are invoked to prepare for the acquisition of a new resource, which is supported through the preacquire command. Each of these scripts are called, before the system actually attempts to integrate the resource, unless an error was reported and the FORCE environment variable was not specified by the user. If the FORCE environment variable was specified, the system proceeds to the integrate stage, regardless of the script's stated return code. No errors are detected when the FORCE environment variable is specified, because all errors are avoidable by unconfiguring the application, which is an accepted practice when the FORCE environment variable is specified. If an error is encountered and the FORCE environment variable is not specified, the system will proceed to the undopreacquire phase, but only the previously executed scripts in the current phase are rerun. During this latter phase, the scripts are directed to perform recovery actions.
undopreacquire
The undopreacquire phase is provided so that the scripts can perform recovery operations. If a script is called in the undopreacquire phase, it can assume that it successfully completed the preacquire command.
postacquire
The postacquire command is executed after the resource has been successfully integrated by the system. Each DLPAR script that was previously called in the check and pre phases is called again. This command is used to incorporate the new resource into the application. For example, the application might want to create new threads, expands its buffers, or the application might need to be restarted if it was previously halted.
checkmigrate
This command is the first DLPAR script-based command that is called in the migration sequence. It is called for each installed script that previously indicated that it supported the particular type of resource that is being added. The checkmigrate command is always started, regardless of the value of the FORCE environment variable, and the calling program honors the return code of the script. The user cannot force the partition migration if a script or DLPAR-aware program fails the DLPAR operation in the check phase.
premigrate
Assuming that no errors were reported in the checkmigrate phase, the system advances to the premigrate phase, where the same set of scripts are started to prepare for partition takes place. Each of these scripts are called, before the system actually attempts to migrate the partition. Regardless of the script's stated return code, the system proceeds to the migration stage. If an error is encountered, the system will proceed to the undopremigrate phase, but only the previously executed scripts in the current phase are rerun. During this latter phase, the scripts are directed to perform recovery actions.
undopremigrate
The undopremigrate phase is provided so that the scripts can perform recovery operations. If a script is called in the undopremigrate phase, it can assume that it successfully completed the premigrate command.
postmigrate
The postmigrate command is executed after the partition has been successfully migrated. Each DLPAR script that was previously called in the check and pre phases is called again.
pretopologyupdate
The pretopologyupdate command is executed before an action takes place that will affect the topology of the partition, such as the addition or removal of processors or memory. This command is meant to inform the scripts a topology action has taken place and can not fail. The system proceeds to the integrate stage, regardless of the script's stated return code.
posttopologyupdate
The posttopologyupdate command is executed after the partition has been successfully completed the topology action. Each DLPAR script that was previously called in the pre phase is called again.
checkhibernate
This command is the first DLPAR script-based command that is called in the hibernation sequence. It is called for each installed script that previously indicated that it supported the particular type of resource that is being added. The checkhibernate command is always started, regardless of the value of the FORCE environment variable, and the calling program honors the return code of the script. The user cannot force the partition hibernation if a script or DLPAR-aware program fails the DLPAR operation in the check phase.
prehibernate
Assuming that no errors were reported in the checkhibernate phase, the system advances to the prehibernate phase, where the same set of scripts are started to prepare for partition takes place. Each of these scripts are called, before the system actually attempts to hibernate the partition. Regardless of the script's stated return code, the system proceeds to the hibernation stage. If an error is encountered, the system will proceed to the undohibernate phase, but only the previously executed scripts in the current phase are rerun. During this latter phase, the scripts are directed to perform recovery actions.
undohibernate
The undohibernate phase is provided so that the scripts can perform recovery operations. If a script is called in the checkhibernatephase, it can assume that it successfully completed the checkhibernate command.
posthibernate
The posthibernate command is executed after the partition has been successfully hibernated. Each DLPAR script that was previously called in the check and pre phases is called again.
preaccevent
This command is the first DLPAR script-based command that is called in the encryption accelerator DLPAR sequence. It is called for each installed script that previously indicated that it supported the particular type of resource that is being added or released. It is unknown at the time of this event if the following action will be an add or release of the encryption accelerator. That action will be provided during one of the following post phases.
postaccevent
The postaccevent command is executed after the resource has been successfully processed by the system. Each DLPAR script that was previously called in the pre phase is called again. This command is used to incorporate the new resource state into the application.
undoaccevent
The undoaccevent phase is provided so that the scripts can perform recovery operations. If a script is called in the undoaccevent phase, it successfully completed the preaccevent command.