Change Cluster Resource Services (QcstChgClusterResourceServices) API


  Required Parameter Group:


Service Program: QCSTCTL2

  Default Public Authority: *EXCLUDE

  Threadsafe: Yes

The Change Cluster Resource Services (QcstChgClusterResourceServices) API is used to tune cluster performance and change cluster configuration parameters.

When tuning cluster performance using format CRSC0100 or CRSC0200, this API provides a base level of tuning support where the cluster will adjust to a predefined set of values identified for high, low, and normal timeout and messaging interval values using format CRSC0100. If an advanced level of tuning is desired, usually anticipated with the help of IBM® support personnel, then individual parameters may be tuned over a predefined range of values using format CRSC0200. Example control language command source has been provided in the base operating system option 7 (Example Tools Library, QUSRTOOL). See member, TCSTINFO, in file QUSRTOOL/QATTSYSC for more information.

The default tuning values are set on a create operation and changes must be made under the Change Cluster Resource Services API documented here. Values for current settings may be retrieved using the Retrieve Cluster Resource Services Information (QcstRetrieveCRSInfo) API.

Using format CRSC0300, the user can define a cluster message queue and failover actions for the cluster. Prior to cluster version 6, a failover message queue could be defined for a CRG. If the failover message queue was defined, a message was enqueued during the failover of the CRG, allowing the user to cancel or continue the failover. If a cluster node ended or failed and there were multiple CRGs with that node as a primary recovery domain node, the user would have needed to respond to a message for each CRG.

In cluster version 6 and above, the user has the option of receiving and responding to one message for all CRGs which are failing over to the same node when the primary node for the CRGs ends or fails. A cluster message queue, failover wait time, and failover default action may be specified on this API. If failure occurs on a node, that node is the primary recovery domain node for any active CRGs, and the cluster message queue is defined, then one message will be enqueued on the cluster message queue. This gives the user the option of continuing all CRG failovers to the new primary, or cancelling all CRG failovers. No message will be enqueued if the primary node is removed from the cluster. If a CRG is failing over individually, one message will be sent which will control the failover of that CRG. The message will be placed on the message queue on the new primary node before the CRGs call their exit programs. If the failovers are cancelled, the primary node of the CRGs will not be changed, and the cluster resource groups will become Inactive. The exit programs will be called with an action code of Failover Cancelled.

If the user wants to specify failover actions for a specific CRG, the failover message queue fields on the Create Cluster Resource Group API or Change Cluster Resource Group API should be used instead of the failover fields on the Create Cluster API or the Change Cluster Resource Services API. If the failover fields are set at a cluster level, they will override any CRG failover parameters. If the cluster message queue is set to *NONE, then the failover of each individual CRG can be controlled via the CRG failover parameters.

The rules for merging of partitioned nodes are as follows:

The following conditions apply to this API:

This API operates in an asynchronous mode. See Behavior of Cluster Resource Services APIs for more information.

Restriction: This API cannot be called from a cluster resource group exit program.

Authorities and Locks

The program that calls this API must be running under a user profile with *IOSYSCFG special authority.

User Queue Authority
*OBJOPR and *ADD
User Queue Library Authority
*EXECUTE
User Queue Lock
*EXCLRD
Cluster Message Queue Authority
*OBJOPR and *ADD
Cluster Message Queue Library Authority
*EXECUTE

Required Parameter Group

Request handle
OUTPUT; CHAR(16)

A unique string or handle that identifies this API call. It is used to associate this call to any responses placed on the user queue specified in the results information parameter.

Cluster name
INPUT; CHAR(10)

The name of the cluster.

Cluster resource services information
INPUT; CHAR(*)

Detailed information about the cluster resource services.

Length of cluster resource services information
INPUT; BINARY(4)

The length of the cluster resource services information.

Format name
INPUT; CHAR(8)

The format of the Cluster Resource Services information to be changed. The possible format names are:


Results information
INPUT; CHAR(30)

A library qualified user queue name followed by a reserved field.

Library qualified user queue: A user queue, which exists on the node from which the API was called, that receives results information after the function has completed on all active nodes in the cluster. See the Usage Notes section of this API for a description of the data that is placed on this queue. This is a 20 character field. The first 10 characters contain the user queue name and the second 10 characters contain the user queue library name. No special values are supported. QTEMP, *LIBL, and *CURLIB are not valid for the library name. The attributes of this user queue must be keyed.

Reserved: The last 10 characters of results information are reserved and must be set to hexadecimal zero.

Error code
I/O; CHAR(*)

The structure in which to return error information. For the format of the structure, see Error code parameter.


CRSC0100 Format



CRSC0200 Format


CRSC0300 Format


Field Descriptions

Note: Specify -1 on any parameters that are not changed. This pertains to format CRSC0200 only.

Note: Units and ranges for the fields described here may be found in the Field Settings Range Table located at the end of this Field Descriptions section of this document.

Ack remote fragments. Provides a switch to enable or disable a cluster messaging level acknowledgment for receipt of each fragment sent to a remote cluster node. Fragments are sent by the cluster messaging service for each cluster message whose size is greater than the specified Message fragment size. Remote cluster nodes are defined to be any nodes not on the local LAN (having a network or subnet IP address other than that of the source node for the message). ACKing remote fragments may be desirable in those few cases where low bandwidth gateways, routers, or bridges exist between local and remote systems.

CDAT protocol timeout interval. The timeout value used for distributing the Cluster Destination Address Table (CDAT) and synchronizing cluster communications when doing a create cluster, add node, or start node process. As the number of nodes in the cluster increases, the time required to run this synchronizing protocol increases. This is a low level Cluster Resoure Services start-up protocol.

Cluster message queue library name. The name of the library that contains the user queue to receive cluster messages. The library name cannot be *CURLIB, QTEMP, *LIBL, USRLIBL, *ALL, or *ALLUSR. This field must be set to hexadecimal zeroes if the cluster message queue name is *SAME or *NONE.

Cluster message queue name. The name of the message queue to receive messages relating to cluster or node level events. For cluster version 6, messages relating to failover will be sent to this queue. For node level failovers, one message will be sent which will control the failover of all CRGs wil the same primary node. If a CRG is failing over individually, one message will be sent which will control the failover of that CRG. The message will be sent on the new primary node. If this field is set, the individual CRG failover message queue fields will not be used. If this field is set, the specified message queue must exist on all started nodes in the cluster. The queue cannot be in an independent auxiliary storage pool. Valid special values for this field are:

Cluster recovery interval. The interval at which a cluster node takes inventory of required recovery actions and attempts automatic recovery as necessary. Those items checked are:

Configuration tuning level. Provides for a simple way to set cluster performance and configuration parameters. The valid values for this field are:

Delayed ack timer. The timer used over inbound reliable messages to force an acknowledgment for unacknowledged messages should the sender not have requested an acknowledgment over the last delayed ack time period. This timer is started on receipt of a reliable message and stopped when an acknowledgment is sent for one or more unacknowledged messages.

Enable multicast. The cluster communications infrastructure makes use of User Datagram Protocol (UDP) multicast capabilities as the preferred protocol for sending cluster management information between nodes in a cluster. Where multicast capabilities are supported by the underlying physical media, cluster communications will utilize the UDP multicast to send management messaging from a given node to all local cluster nodes supporting the same subnet address. Messages being sent to nodes on remote networks will always be sent using UDP point to point capabilities. Cluster communications does not rely on routing capability of multicast messages.

The multicast traffic supporting cluster management messaging tends by nature to be bursty. Depending on the number of nodes on a given LAN (supporting a common subnet address) and the complexity of the cluster management structure that is chosen by the cluster administrator, cluster related multicast packets can easily exceed 40 packets/second. Bursts of this nature could have a negative impact on older networking equipment. One example would be congestion problems on devices on the LAN serving as Simple Network Management Protocol (SNMP) agents which need to evaluate each and every UDP multicast packet. Some of the earlier networking equipment does not have adequate bandwidth to keep up with this type of traffic. Insure that the network administrator has reviewed the capacity of the networks to handle UDP multicast traffic to make certain that clustering will not have a negative impact on the health and performance of the networks over which it is chosen to operate.

If the network does not wish to have the more efficient multicast capabilities used, setting this field to FALSE (0) will disable the multicast capabilities of the cluster and only point to point communications will be used by the cluster messaging services.

Failover default action. Indicates what clustering should do when a response to the failover message on the cluster message queue was not received in the failover wait time limit. If the cluster message queue is *NONE, this field must be set to 0. If the cluster message queue is *SAME and was previously *NONE, this field must be set to -1 or 0. Valid values are:

Failover wait time. Number of minutes to wait for a reply to the failover message that was enqueued on the cluster message queue. If the cluster message queue is *NONE, this field must be set to 0. If the cluster message queue is *SAME and was previously *NONE, this field must be set to -2 or 0. If a cluster message queue is specified, this field cannot be set to 0. Valid values are:

Length of fixed fields. The length of the fixed fields in the format structure. For format CRSC0300 this must be set to 32.

Maximum retry time. Reliable messages are resent at exponentially increasing times should they timeout (that is, not receive a timely acknowledgment). The initial timeout value for a message is the Retry Timer Value and each successive retry builds up by a factor of 2 until the Maximum retry timer value is exceeded. For the default cases, a message would be sent, resent 1 second later, then 2 seconds, 4 seconds, and finally 8 seconds. This represents a total of 15 seconds following which attempts to use alternate internet addressing are tried with the same timer values.

Maximum retry timer ratio. Remote subnets (remote cluster nodes on another LAN/WAN/BUS supporting a different subnet interface address than the sending node) use an extended message timeout value which is based from the Maximum retry time used for local subnets (local cluster nodes supporting the same subnet interface address). For the default case, the Maximum retry time for a local multicast message would be 8 seconds and for a remote point to point message would be 8 x 8 = 64 seconds. This allows for network routing considerations.

Message fragment size. Cluster communications fragments its own messages. This fragment size should be set consistent with the physical media and routing capabilities throughout the network used for clustering. The preferred settings allow for the largest fragment size possible that does not exceed any of the hardware Maximum Transmission Units defined over the entire path so that clustering does all of the fragmentation, not the intermediary networks. The default is set to assume a minimum 1500 byte (less network header space) Ethernet environment.

Message send window. The number of messages allowed outstanding without having received an acknowledgment. The higher the number, the lower the message latency but the larger the required buffer space on a node to save inbound messages.

Number of ack messages threshold. The number of repeat messages that are received over the course of a cluster recovery interval before acknowledgments are sent to multiple source interface addresses for a given node instead of just the current primary address for each message received. While increasing the number of ACKs flowing, this reduces the message resends and latency given that an intermittent communications condition is detected. Eventually, one of the node addresses should be marked as failed and at cluster recovery time, messaging will settle back down using single acknowledgments.

Number of bad messages threshold. The number of undeliverable messages per Cluster recovery interval allowed before a failing status is assigned to a node's interface address. At this time, a secondary address (if available) is assigned to be the new primary interface address for the subject remote node.

Performance class. The requested performance characteristics of the cluster communications messaging protocol. Pacing is selectively used for sending out fragments of large messages. Messages are fragmented by the cluster messaging service at the specified message fragment size. The pacing mechanism releases a set number of fragments to the underlying physical layer, then delays, then releases a next set. This is to avoid over running slower physical media. Local here refers to nodes on a local LAN. Remote refers to messaging to cluster nodes on other than the local LAN. Valid values for the performance class are as follows:

Reachable heartbeat ack threshold. A node becomes reachable (formerly having been marked as unreachable) from a Cluster Communications heartbeating perspective if "Reachable heartbeat ack threshold" (or greater) heartbeat message ACKs are received for the last "Reachable heartbeat threshold" heartbeat messages sent to a node. For the default case, a node becomes reachable if 3 or more of the last four heartbeats sent to the marked unreachable node are now acknowledged.

Reachable heartbeat threshold. See Reachable heartbeat ack threshold field description.

Receive/send heartbeat timer ratio. Ratio of incoming heartbeat messages expected from a neighboring node to the number of heartbeat messages that are sent out. The send rate is always set higher to insure a neighboring node's receive heartbeat timer does not fire under normal operational circumstances.

Retry timer value. See Maximum retry time field description.

Send heartbeat interval. The interval at which a low level Cluster Communications heartbeat message is sent to a neighboring node.

Send queue overflow. The maximum number of messages that are allowed to be queued up in a Cluster Communications outbound message queue. The cluster communication send queues are distributed amongst the various groups. The larger the number, the greater the memory resources that are required to support cluster messaging. If a send queue overflow is hit for a given group, the inability to send a message could lead to the termination of that group resulting from the lack of resources on a node.

Unreachable heartbeat ack threshold. A reachable node becomes unreachable from a cluster communications heartbeating perspective if "Unreachable heartbeat ack threshold" heartbeat message ACKs (or less) are received for the last "Unreachable heartbeat threshold" heartbeat messages sent to a node. For the default case, a node becomes unreachable if one or less of the last four heartbeats sent to the marked reachable node are acknowledged.

Unreachable heartbeat threshold. See Unreachable heartbeat ack threshold field description.


Field Settings for CRSC0200 Format



Field Settings Range


Usage Notes

Results Information User Queue

Asynchronous results are returned to a user queue specified by the Results Information parameter of the API. See Cluster APIs Use of User Queues and Using Results Information for details on how to create the results information user queue, the format of the entries, and how to use the data placed on the queue. The data is sent to the user queue in the form of a message identifier and the substitution data for the message (if any exists). The following identifies the data sent to the user queue (excluding the message text).



Error Messages

Messages that are delivered through the error code parameter are listed here. The data (messages) sent to the results information user queue are listed in the Usage Notes above.



API introduced: V5R1

[ Back to top | Cluster APIs | APIs by category ]