Change Cluster Resource Services (QcstChgClusterResourceServices) API

Required Parameter Group:

Request handle

Output

Char(16)

Cluster name

Input

Char(10)

Cluster resource services information

Input

Char(*)

Length of cluster resource services information

Input

Binary(4)

Format name

Input

Char(8)

Results information

Input

Char(30)

Error code

I/O

Char(*)

Service Program: QCSTCTL2

Default Public Authority: *EXCLUDE

Threadsafe: Yes

The Change Cluster Resource Services (QcstChgClusterResourceServices) API is used to tune cluster performance and change cluster configuration parameters.

When tuning cluster performance using format CRSC0100 or CRSC0200, this API provides a base level of tuning support where the cluster will adjust to a predefined set of values identified for high, low, and normal timeout and messaging interval values using format CRSC0100. If an advanced level of tuning is desired, usually anticipated with the help of IBM^® support personnel, then individual parameters may be tuned over a predefined range of values using format CRSC0200. Example control language command source has been provided in the base operating system option 7 (Example Tools Library, QUSRTOOL). See member, TCSTINFO, in file QUSRTOOL/QATTSYSC for more information.

The default tuning values are set on a create operation and changes must be made under the Change Cluster Resource Services API documented here. Values for current settings may be retrieved using the Retrieve Cluster Resource Services Information (QcstRetrieveCRSInfo) API.

Using format CRSC0300, the user can define a cluster message queue and failover actions for the cluster. Prior to cluster version 6, a failover message queue could be defined for a CRG. If the failover message queue was defined, a message was enqueued during the failover of the CRG, allowing the user to cancel or continue the failover. If a cluster node ended or failed and there were multiple CRGs with that node as a primary recovery domain node, the user would have needed to respond to a message for each CRG.

In cluster version 6 and above, the user has the option of receiving and responding to one message for all CRGs which are failing over to the same node when the primary node for the CRGs ends or fails. A cluster message queue, failover wait time, and failover default action may be specified on this API. If failure occurs on a node, that node is the primary recovery domain node for any active CRGs, and the cluster message queue is defined, then one message will be enqueued on the cluster message queue. This gives the user the option of continuing all CRG failovers to the new primary, or cancelling all CRG failovers. No message will be enqueued if the primary node is removed from the cluster. If a CRG is failing over individually, one message will be sent which will control the failover of that CRG. The message will be placed on the message queue on the new primary node before the CRGs call their exit programs. If the failovers are cancelled, the primary node of the CRGs will not be changed, and the cluster resource groups will become Inactive. The exit programs will be called with an action code of Failover Cancelled.

If the user wants to specify failover actions for a specific CRG, the failover message queue fields on the Create Cluster Resource Group API or Change Cluster Resource Group API should be used instead of the failover fields on the Create Cluster API or the Change Cluster Resource Services API. If the failover fields are set at a cluster level, they will override any CRG failover parameters. If the cluster message queue is set to *NONE, then the failover of each individual CRG can be controlled via the CRG failover parameters.

The rules for merging of partitioned nodes are as follows:

If the tuning and configuration parameters defined under the Change Cluster Resource Services API documented here match exactly in both partitions, a merge will be allowed.

The following conditions apply to this API:

If the cluster message queue is specified, it must exist on all started nodes in the cluster.

This API operates in an asynchronous mode. See Behavior of Cluster Resource Services APIs for more information.

Restriction: This API cannot be called from a cluster resource group exit program.

Authorities and Locks

The program that calls this API must be running under a user profile with *IOSYSCFG special authority.

User Queue Authority: *OBJOPR and *ADD
User Queue Library Authority: *EXECUTE
User Queue Lock: *EXCLRD
Cluster Message Queue Authority: *OBJOPR and *ADD
Cluster Message Queue Library Authority: *EXECUTE

Required Parameter Group

Request handle

OUTPUT; CHAR(16)

A unique string or handle that identifies this API call. It is used to associate this call to any responses placed on the user queue specified in the results information parameter.

Cluster name

INPUT; CHAR(10)

The name of the cluster.

Cluster resource services information

INPUT; CHAR(*)

Detailed information about the cluster resource services.

Length of cluster resource services information

INPUT; BINARY(4)

The length of the cluster resource services information.

Format name

INPUT; CHAR(8)

The format of the Cluster Resource Services information to be changed. The possible format names are:

CRSC0100

Automatic tuning to a level of high, low or normal heartbeat intervals and message timeout values for cluster performance and configuration parameters.

CRSC0200

Manually tune one or more of the cluster performance and configuration parameters.

CRSC0300

Change cluster configuration attributes.

Results information

INPUT; CHAR(30)

A library qualified user queue name followed by a reserved field.

Library qualified user queue: A user queue, which exists on the node from which the API was called, that receives results information after the function has completed on all active nodes in the cluster. See the Usage Notes section of this API for a description of the data that is placed on this queue. This is a 20 character field. The first 10 characters contain the user queue name and the second 10 characters contain the user queue library name. No special values are supported. QTEMP, *LIBL, and *CURLIB are not valid for the library name. The attributes of this user queue must be keyed.

Reserved: The last 10 characters of results information are reserved and must be set to hexadecimal zero.

Error code

I/O; CHAR(*)

The structure in which to return error information. For the format of the structure, see Error code parameter.

CRSC0100 Format

Offset

Type

Field

Dec

Hex

BINARY(4)

Configuration tuning level

CRSC0200 Format

Offset

Type

Field

Dec

Hex

BINARY(8)

Receive/send heartbeat timer ratio

BINARY(8)

Maximum retry timer ratio

BINARY(8)

Send heartbeat interval

BINARY(8)

Retry timer value

BINARY(8)

CDAT protocol timeout interval

BINARY(8)

Cluster recovery interval

BINARY(8)

Maximum retry time

BINARY(8)

Message fragment size

BINARY(8)

Send queue overflow

BINARY(8)

Number of bad messages threshold

BINARY(8)

Number of ack messages threshold

BINARY(8)

Unreachable heartbeat ack threshold

BINARY(8)

Reachable heartbeat ack threshold

104

BINARY(8)

Unreachable heartbeat threshold

112

BINARY(8)

Reachable heartbeat threshold

120

BINARY(8)

Delayed ack timer

128

BINARY(8)

Message send window

136

BINARY(8)

Enable multicast

144

BINARY(8)

Performance class

152

BINARY(8)

Ack remote fragments

CRSC0300 Format

Offset

Type

Field

Dec

Hex

BINARY(4)

Length of fixed fields

CHAR(10)

Cluster message queue name

CHAR(10)

Cluster message queue library name

BINARY(4)

Failover wait time

BINARY(4)

Failover default action

Field Descriptions

Note: Specify -1 on any parameters that are not changed. This pertains to format CRSC0200 only.

Note: Units and ranges for the fields described here may be found in the Field Settings Range Table located at the end of this Field Descriptions section of this document.

Ack remote fragments. Provides a switch to enable or disable a cluster messaging level acknowledgment for receipt of each fragment sent to a remote cluster node. Fragments are sent by the cluster messaging service for each cluster message whose size is greater than the specified Message fragment size. Remote cluster nodes are defined to be any nodes not on the local LAN (having a network or subnet IP address other than that of the source node for the message). ACKing remote fragments may be desirable in those few cases where low bandwidth gateways, routers, or bridges exist between local and remote systems.

CDAT protocol timeout interval. The timeout value used for distributing the Cluster Destination Address Table (CDAT) and synchronizing cluster communications when doing a create cluster, add node, or start node process. As the number of nodes in the cluster increases, the time required to run this synchronizing protocol increases. This is a low level Cluster Resoure Services start-up protocol.

Cluster message queue library name. The name of the library that contains the user queue to receive cluster messages. The library name cannot be *CURLIB, QTEMP, *LIBL, USRLIBL, *ALL, or *ALLUSR. This field must be set to hexadecimal zeroes if the cluster message queue name is *SAME or *NONE.

Cluster message queue name. The name of the message queue to receive messages relating to cluster or node level events. For cluster version 6, messages relating to failover will be sent to this queue. For node level failovers, one message will be sent which will control the failover of all CRGs wil the same primary node. If a CRG is failing over individually, one message will be sent which will control the failover of that CRG. The message will be sent on the new primary node. If this field is set, the individual CRG failover message queue fields will not be used. If this field is set, the specified message queue must exist on all started nodes in the cluster. The queue cannot be in an independent auxiliary storage pool. Valid special values for this field are:

*SAME

The current cluster message queue is not changed.

*NONE

No cluster message queue has been defined

Cluster recovery interval. The interval at which a cluster node takes inventory of required recovery actions and attempts automatic recovery as necessary. Those items checked are:

Unreachable alternate point-point interface addresses for remote nodes.
Unreachable multicast IP address for the local subnet.
Partitioned nodes.

Configuration tuning level. Provides for a simple way to set cluster performance and configuration parameters. The valid values for this field are:

Adjustments are made to cluster communications to decrease the heartbeating frequency and increase the various message timeout values. With fewer heartbeats and longer timeout values, the cluster will be slower to respond (less sensitive) to communications failures.

Default values are used for cluster communications performance and configuration parameters. This setting may be used to return all parameters to the original default values.

Adjustments are made to cluster communications to increase the heartbeating frequency and decrease the various message timeout values. With more frequent heartbeats and shorter timeout values, the cluster will be quicker to respond (more sensitive) to communications failures.

Delayed ack timer. The timer used over inbound reliable messages to force an acknowledgment for unacknowledged messages should the sender not have requested an acknowledgment over the last delayed ack time period. This timer is started on receipt of a reliable message and stopped when an acknowledgment is sent for one or more unacknowledged messages.

Enable multicast. The cluster communications infrastructure makes use of User Datagram Protocol (UDP) multicast capabilities as the preferred protocol for sending cluster management information between nodes in a cluster. Where multicast capabilities are supported by the underlying physical media, cluster communications will utilize the UDP multicast to send management messaging from a given node to all local cluster nodes supporting the same subnet address. Messages being sent to nodes on remote networks will always be sent using UDP point to point capabilities. Cluster communications does not rely on routing capability of multicast messages.

The multicast traffic supporting cluster management messaging tends by nature to be bursty. Depending on the number of nodes on a given LAN (supporting a common subnet address) and the complexity of the cluster management structure that is chosen by the cluster administrator, cluster related multicast packets can easily exceed 40 packets/second. Bursts of this nature could have a negative impact on older networking equipment. One example would be congestion problems on devices on the LAN serving as Simple Network Management Protocol (SNMP) agents which need to evaluate each and every UDP multicast packet. Some of the earlier networking equipment does not have adequate bandwidth to keep up with this type of traffic. Insure that the network administrator has reviewed the capacity of the networks to handle UDP multicast traffic to make certain that clustering will not have a negative impact on the health and performance of the networks over which it is chosen to operate.

If the network does not wish to have the more efficient multicast capabilities used, setting this field to FALSE (0) will disable the multicast capabilities of the cluster and only point to point communications will be used by the cluster messaging services.

Failover default action. Indicates what clustering should do when a response to the failover message on the cluster message queue was not received in the failover wait time limit. If the cluster message queue is *NONE, this field must be set to 0. If the cluster message queue is *SAME and was previously *NONE, this field must be set to -1 or 0. Valid values are:

-1

Failover default action is not changed.

Proceed with failover.

Do NOT attempt failover.

Failover wait time. Number of minutes to wait for a reply to the failover message that was enqueued on the cluster message queue. If the cluster message queue is *NONE, this field must be set to 0. If the cluster message queue is *SAME and was previously *NONE, this field must be set to -2 or 0. If a cluster message queue is specified, this field cannot be set to 0. Valid values are:

-2

Failover wait time is not changed.

-1

Wait forever until a response is given to the failover message.

Failover proceeds without user intervention. Acts the same as V5R4M0 and prior.

>=1

Number of minutes to wait for a response to the failover message. If no response is received in the specified number of minutes, the failover default action field will be looked at to decide how to proceed.

Length of fixed fields. The length of the fixed fields in the format structure. For format CRSC0300 this must be set to 32.

Maximum retry time. Reliable messages are resent at exponentially increasing times should they timeout (that is, not receive a timely acknowledgment). The initial timeout value for a message is the Retry Timer Value and each successive retry builds up by a factor of 2 until the Maximum retry timer value is exceeded. For the default cases, a message would be sent, resent 1 second later, then 2 seconds, 4 seconds, and finally 8 seconds. This represents a total of 15 seconds following which attempts to use alternate internet addressing are tried with the same timer values.

Maximum retry timer ratio. Remote subnets (remote cluster nodes on another LAN/WAN/BUS supporting a different subnet interface address than the sending node) use an extended message timeout value which is based from the Maximum retry time used for local subnets (local cluster nodes supporting the same subnet interface address). For the default case, the Maximum retry time for a local multicast message would be 8 seconds and for a remote point to point message would be 8 x 8 = 64 seconds. This allows for network routing considerations.

Message fragment size. Cluster communications fragments its own messages. This fragment size should be set consistent with the physical media and routing capabilities throughout the network used for clustering. The preferred settings allow for the largest fragment size possible that does not exceed any of the hardware Maximum Transmission Units defined over the entire path so that clustering does all of the fragmentation, not the intermediary networks. The default is set to assume a minimum 1500 byte (less network header space) Ethernet environment.

Message send window. The number of messages allowed outstanding without having received an acknowledgment. The higher the number, the lower the message latency but the larger the required buffer space on a node to save inbound messages.

Number of ack messages threshold. The number of repeat messages that are received over the course of a cluster recovery interval before acknowledgments are sent to multiple source interface addresses for a given node instead of just the current primary address for each message received. While increasing the number of ACKs flowing, this reduces the message resends and latency given that an intermittent communications condition is detected. Eventually, one of the node addresses should be marked as failed and at cluster recovery time, messaging will settle back down using single acknowledgments.

Number of bad messages threshold. The number of undeliverable messages per Cluster recovery interval allowed before a failing status is assigned to a node's interface address. At this time, a secondary address (if available) is assigned to be the new primary interface address for the subject remote node.

Performance class. The requested performance characteristics of the cluster communications messaging protocol. Pacing is selectively used for sending out fragments of large messages. Messages are fragmented by the cluster messaging service at the specified message fragment size. The pacing mechanism releases a set number of fragments to the underlying physical layer, then delays, then releases a next set. This is to avoid over running slower physical media. Local here refers to nodes on a local LAN. Remote refers to messaging to cluster nodes on other than the local LAN. Valid values for the performance class are as follows:

Normal: Pacing applied to local and remote fragments.

High Throughput Local: Pacing applied to remote fragments.

High Throughput Local and Remote: No pacing of any fragmented messages.

High Throughput Remote: Pacing applied to local fragments.

Reachable heartbeat ack threshold. A node becomes reachable (formerly having been marked as unreachable) from a Cluster Communications heartbeating perspective if "Reachable heartbeat ack threshold" (or greater) heartbeat message ACKs are received for the last "Reachable heartbeat threshold" heartbeat messages sent to a node. For the default case, a node becomes reachable if 3 or more of the last four heartbeats sent to the marked unreachable node are now acknowledged.

Reachable heartbeat threshold. See Reachable heartbeat ack threshold field description.

Receive/send heartbeat timer ratio. Ratio of incoming heartbeat messages expected from a neighboring node to the number of heartbeat messages that are sent out. The send rate is always set higher to insure a neighboring node's receive heartbeat timer does not fire under normal operational circumstances.

Retry timer value. See Maximum retry time field description.

Send heartbeat interval. The interval at which a low level Cluster Communications heartbeat message is sent to a neighboring node.

Send queue overflow. The maximum number of messages that are allowed to be queued up in a Cluster Communications outbound message queue. The cluster communication send queues are distributed amongst the various groups. The larger the number, the greater the memory resources that are required to support cluster messaging. If a send queue overflow is hit for a given group, the inability to send a message could lead to the termination of that group resulting from the lack of resources on a node.

Unreachable heartbeat ack threshold. A reachable node becomes unreachable from a cluster communications heartbeating perspective if "Unreachable heartbeat ack threshold" heartbeat message ACKs (or less) are received for the last "Unreachable heartbeat threshold" heartbeat messages sent to a node. For the default case, a node becomes unreachable if one or less of the last four heartbeats sent to the marked reachable node are acknowledged.

Unreachable heartbeat threshold. See Unreachable heartbeat ack threshold field description.

Field Settings for CRSC0200 Format

Configuration Tuning Level

Field

Unit

Receive/send heartbeat timer ratio

unitless

Maximum retry timer ratio

unitless

Send heartbeat interval

seconds

Retry timer value

seconds

CDAT protocol timeout interval

minutes

Cluster recovery interval

minutes

Maximum retry time

seconds

Message fragment size

1,464

bytes

Send queue overflow

1,024

messages

Number of bad messages threshold

messages

Number of ack messages threshold

messages

Unreachable heartbeat ack threshold

messages

Reachable heartbeat ack threshold

messages

Unreachable heartbeat threshold

messages

Reachable heartbeat threshold

messages

Delayed ack timer

300

100

milliseconds

Message send window

messages

Enable multicast

TRUE(1)

unitless

Performance class

unitless

Ack remote fragments

FALSE(0)

unitless

Field Settings Range

Field

Minimum

Default

Maximum

Unit

Receive/send heartbeat timer ratio

unitless

Maximum retry timer ratio

unitless

Send heartbeat interval

seconds

Retry timer value

seconds

CDAT protocol timeout interval

minutes

Cluster recovery interval

minutes

Maximum retry time

seconds

Message fragment size

540

1,464

32,500

bytes

Send queue overflow

512

1,024

4,096

messages

Number of bad messages threshold

messages

Number of ack messages threshold

messages

Unreachable heartbeat ack threshold

messages

Reachable heartbeat ack threshold

messages

Unreachable heartbeat threshold

messages

Reachable heartbeat threshold

messages

Delayed ack timer

100

300

milliseconds

Message send window

messages

Enable multicast

FALSE(0)

TRUE(1)

unitless

Performance class

unitless

Ack remote fragments

FALSE(0)

TRUE(1)

unitless

Usage Notes

Results Information User Queue

Asynchronous results are returned to a user queue specified by the Results Information parameter of the API. See Cluster APIs Use of User Queues and Using Results Information for details on how to create the results information user queue, the format of the entries, and how to use the data placed on the queue. The data is sent to the user queue in the form of a message identifier and the substitution data for the message (if any exists). The following identifies the data sent to the user queue (excluding the message text).

Message ID

Message Text

CPCBB01 C

Cluster Resource Services API &1 completed.

CPF2113 E

Cannot allocate library &1.

CPF3CF2 D

Error(s) occurred during running of &1 API.

CPF9801 E

Object &2 in library &3 not found.

CPF9802 E

Not authorized to object &2 in &3.

CPF9804 E

Object &2 in library &3 damaged.

CPF980C E

Object &1 in library &2 can not be in an independent auxiliary storage pool.

CPF9810 E

Library &1 not found.

CPF9820 E

Not authorized to use library &1.

CPFBB24 D

Node &1 not participating in &2 API protocol.

CPFBB2D D

Timeout detected while waiting for a response.

CPFBB46 D

Cluster Resource Services internal error.

CPFBB4D D

Cluster Resource Services cannot process the request.

Error Messages

Messages that are delivered through the error code parameter are listed here. The data (messages) sent to the results information user queue are listed in the Usage Notes above.

Message ID

Error Message Text

CPF2113 E

Cannot allocate library &1.

CPF3C1E E

Required parameter &1 omitted.

CPF3C21 E

Format name &1 is not valid.

CPF3C39 E

Value for reserved field not valid.

CPF3CF1 E

Error code parameter not valid.

CPF3CF2 E

Error(s) occurred during running of &1 API.

CPF9801 E

Object &2 in library &3 not found.

CPF9802 E

Not authorized to object &2 in &3.

CPF9804 E

Object &2 in library &3 damaged.

CPF980C E

Object &1 in library &2 cannot be in an independent auxiliary storage pool.

CPF9810 E

Library &1 not found.

CPF9820 E

Not authorized to use library &1.

CPF9872 E

Program or service program &1 in library &2 ended. Reason code &3.

CPFBB02 E

Cluster &1 does not exist.

CPFBB26 E

Cluster Resource Services not active or not responding.

CPFBB32 E

Attributes of user queue &1 in library &2 are not valid.

CPFBB38 E

Library name &1 not allowed for this request.

CPFBB39 E

Current user does not have IOSYSCFG special authority.

CPFBB44 E

&1 API cannot be called from a cluster resource group exit program.

CPFBB46 E

Cluster Resource Services internal error.

CPFBB5F E

Field value within structure is not valid.

CPFBB70 E

API request &1 not compatible with current cluster version.

CPFBB86 E

Length specified in parameter &1 not valid.

CPFBBA2 E

Value &1 specified for failover wait time is not valid.

CPFBBA3 E

Value &1 specified for failover default action is not valid.

API introduced: V5R1

[ Back to top | Cluster APIs | APIs by category ]