Retrieve Cluster Resource Services Information (QcstRetrieveCRSInfo) API

Required Parameter Group:

Receiver variable

Output

Char(*)

Length of receiver variable

Input

Binary(4)

Cluster name

Input

Char(10)

Format name

Input

Char(8)

Error code

I/O

Char(*)

  Service Program: QCSTCTL1

  Default Public Authority: *USE

  Threadsafe: Yes

The Retrieve Cluster Resource Services Information (QcstRetrieveCRSInfo) API retrieves information about the cluster performance and configuration parameters on a requesting node. The requesting node does not need to be active in the cluster to retrieve the information. This API may be called from a cluster resource group exit program.

Authorities and Locks

None

Required Parameter Group

Receiver variable

OUTPUT; CHAR(*)

The receiver variable that receives the information requested. You can specify the size of the area to be smaller than the format requested as long as you specify the length parameter correctly. As a result, the API returns only the data that the area can hold.

Length of receiver variable

INPUT; BINARY(4)

The length of the receiver variable provided. The length of receiver variable parameter may be specified up to the size of the receiver variable specified in the user program. If the length of receiver variable parameter specified is larger than the allocated size of the receiver variable specified in the user program, the results are not predictable. The minimum length is 8 bytes.

Cluster name

INPUT; CHAR(10)

The name of the cluster for which the information is being retrieved.

Format name

INPUT; CHAR(8)

The content and format of the information that is returned. The possible format names are as follows:

RCRS0100

Returns information about the current settings of the cluster performance and configuration parameters. These parameters may be changed using the Change Cluster Resource Services (QcstChgClusterResourceServices) API.

Error code

I/O; CHAR(*)

The structure in which to return error information. For the format of the structure, see Error code parameter.

RCRS0100 Format

Offset

Type

Field

Dec

Hex

BINARY(4)

Bytes returned

BINARY(4)

Bytes available

CHAR(4)

Reserved

BINARY(4)

Configuration tuning level

BINARY(8)

Receive/Send heartbeat timer ratio

BINARY(8)

Maximum retry timer ratio

BINARY(8)

Send heartbeat interval

BINARY(8)

Retry timer value

BINARY(8)

CDAT protocol timeout interval

BINARY(8)

Cluster recovery interval

BINARY(8)

Maximum retry time

BINARY(8)

Message fragment size

BINARY(8)

Send queue overflow

BINARY(8)

Number of bad messages threshold

BINARY(8)

Number of ack messages threshold

104

BINARY(8)

Unreachable heartbeat ack threshold

112

BINARY(8)

Reachable heartbeat ack threshold

120

BINARY(8)

Unreachable heartbeat threshold

128

BINARY(8)

Reachable heartbeat threshold

136

BINARY(8)

Delayed ack timer

144

BINARY(8)

Message send window

152

BINARY(8)

Enable multicast

160

BINARY(8)

Performance class

168

BINARY(8)

Ack remote fragments

Field Descriptions

Note: Units and ranges for the fields described here can be found in the Field Settings Range table located at the end of the Field Descriptions section of the Change Cluster Resource Services API.

Ack remote fragments. Provides a switch to enable or disable a cluster messaging level acknowledgment for receipt or each fragment sent to a remote cluster node. Fragments are sent by the cluster messaging service for each cluster message whose size is greater than the specified message fragment size. Remote cluster nodes are defined to be any nodes not on the local LAN (having a network or subnet IP address other than that of the source node for the message). ACKing remote fragments may be desirable in those few cases where low bandwidth gateways, routers, or bridges exist between local and remote systems. The valid values for this field are:

Acknowledgments are disabled.

Acknowledgments are enabled.

Bytes available. The number of bytes of data available to be returned to the user.

Bytes returned. The number of bytes of data returned to the user.

CDAT protocol timeout interval. The timeout value used for distributing the Cluster Destination Address Table (CDAT) and synchronizing cluster communications when doing a create cluster, add node, or start node process. As the number of nodes in the cluster increases, the time required to run this synchronizing protocol increases. This is a low level Cluster Resource Services start-up protocol.

Cluster recovery interval. The interval at which a cluster node takes inventory of required recovery actions and attempts automatic recovery as necessary. Those items checked are:

Unreachable alternate point-point IP addresses for remote nodes.
Unreachable multicast IP address for the local subnet.
Partitioned nodes.

Configuration tuning level. Retrieves the cluster performance and configuration parameters settings. The individual parameter settings for a fast path set option are defined in the Field Settings Range table found in the Change Cluster Resource Service API documentation. The valid values for this field are:

Settings have been adjusted individually and are not currently set to one of the fast path settings.

Adjustments are made to cluster communications to decrease the heartbeating frequency and increase the various message timeout values. With fewer heartbeats and longer timeout values, the cluster will be slower to respond (less sensitive) to communications failures.

Normal default values are used for cluster communications performance and configuration parameters.

Adjustments are made to cluster communications to increase the heartbeating frequency and decrease the various message timeout values. With more frequent heartbeats and shorter timeout values, the cluster will be quicker to respond (more sensitive) to communications failures.

Delayed ack timer. The timer used over inbound reliable messages to force an acknowledgment for unacknowledged messages should the sender not have requested an acknowledgment over the last delayed ack time period. This timer is started on receipt of a reliable message and stopped when an acknowledgment is sent for one or more unacknowledged messages.

Enable multicast. The cluster communications infrastructure makes use of User Datagram Protocol (UDP) multicast capabilities as the preferred protocol for sending cluster management information between nodes in a cluster. Where multicast capabilities are supported by the underlying physical media, cluster communications will utilize the UDP multicast to send management messaging from a given node to all local cluster nodes supporting the same subnet address. Messages being sent to nodes on remote networks will always be sent using UDP point to point capabilities. Cluster communications does not rely on routing capability of multicast messages.

The multicast traffic supporting cluster management messaging tends by nature to be bursty. Depending on the number of nodes on a given LAN (supporting a common subnet address) and the complexity of the cluster management structure that is chosen by the cluster administrator, cluster related multicast packets can easily exceed 40 packets/second. Bursts of this nature could have a negative impact on older networking equipment. One example would be congestion problems on devices on the LAN serving as Simple Network Management Protocol (SNMP) agents which need to evaluate each and every UDP multicast packet. Some of the earlier networking equipment does not have adequate bandwidth to keep up with this type of traffic. Insure that the network administrator has reviewed the capacity of the networks to handle UDP multicast traffic to make certain that clustering will not have a negative impact on the health and performance of the networks over which it is chosen to operate.

If the network does not wish to have the more efficient multicast capabilities used, setting this field to FALSE (0) will disable the multicast capabilities of the cluster and only point to point communications will be used by the cluster messaging services. The valid values for this field are:

Multicast is disabled.

Multicast is enabled.

Maximum retry time. Reliable messages are resent at exponentially increasing times should they timeout (that is, not receive a timely acknowledgment). The initial timeout value for a message is the Retry Timer Value and each successive retry builds up by a factor of 2 until the Maximum retry timer value is exceeded. For the default cases, a message would be sent, resent 1 second later, then 2 seconds, 4 seconds, and finally 8 seconds. This represents a total of 15 seconds following which attempts to use alternate IP addressing are tried with the same timer values.

Maximum retry timer ratio. Remote subnets (remote cluster nodes on another LAN/WAN/BUS supporting a different subnet IP address than the sending node) use an extended message timeout value which is based from the Maximum retry time used for local subnets (local cluster nodes supporting the same subnet IP address). For the default case, the Maximum retry time for a local multicast message would be 8 seconds and for a remote point to point message would be 8 x 8 = 64 seconds. This allows for network routing considerations.

Message fragment size. Cluster communications fragments its own messages. This fragment size should be set consistent with the physical media and routing capabilities throughout the network used for clustering. The preferred settings allow for the largest fragment size possible that does not exceed any of the hardware Maximum Transmission Units defined over the entire path so that clustering does all of the fragmentation, not the intermediary networks. The default is set to assume a minimum 1500 byte (less network header space) Ethernet environment.

Message send window. The number of messages allowed outstanding without having received an acknowledgment. The higher the number, the lower the message latency but the larger the required buffer space on a node to save inbound messages.

Number of ack messages threshold. The number of repeat messages that are received over the course of a cluster recovery interval before acknowledgments are sent to multiple source IP addresses for a given node instead of just the current primary address for each message received. While increasing the number of ACKs flowing, this reduces the message resends and latency given that an intermittent communications condition is detected. Eventually, one of the node addresses should be marked as failed and at cluster recovery time, messaging will settle back down using single acknowledgments.

Number of bad messages threshold. The number of undeliverable messages per cluster recovery interval allowed before a failing status is assigned to a node's internet address. At this time, a secondary address (if available) is assigned to be the new primary IP address for the subject remote node.

Performance class. The requested performance characteristics of the cluster communications messaging protocol. Pacing is selectively used for sending out fragments of large messages. Messages are fragmented by the cluster messaging service at the specified message fragment size. The pacing mechanism releases a set number of fragments to the underlying physical layer, then delays, then releases a next set. This is to avoid over running slower physical media. Local here refers to nodes on a local LAN. Remote refers to messaging to cluster nodes on other than the local LAN. Valid values for the performance class are as follows:

Normal: Pacing applied to local and remote fragments.

High Throughput Local: Pacing applied to remote fragments.

High Throughput Local and Remote: No pacing of any fragmented messages.

High Throughput Remote: Pacing applied to local fragments.

Reachable heartbeat ack threshold. A node becomes reachable (formerly having been marked as unreachable) from a Cluster Communications heartbeating perspective if "Reachable heartbeat ack threshold" (or greater) heartbeat message ACKs are received for the last "Reachable heartbeat threshold" heartbeat messages sent to a node. For the default case, a node becomes reachable if 3 or more of the last four heartbeats sent to the marked unreachable node are now acknowledged.

Reachable heartbeat threshold. See Reachable heartbeat ack threshold field description.

Receive/Send heartbeat timer ratio. Ratio of incoming heartbeat messages expected from a neighboring node to the number of heartbeat messages that are sent out. The send rate is always set higher to insure a neighboring node's receive heartbeat timer does not fire under normal operational circumstances.

Reserved. This field will contain hexadecimal zeroes.

Retry timer value. See Maximum retry time field description.

Send heartbeat interval. The interval at which a low level Cluster Communications heartbeat message is sent to a neighboring node.

Send queue overflow. The maximum number of messages that are allowed to be queued up in a Cluster Communications outbound message queue. The CC send queues are distributed amongst the various Distributed Activity (DA) groups. The larger the number, the greater the memory resources that are required to support cluster messaging. If a send queue overflow is hit for a given DA, the inability to send a message could lead to the termination of that DA resulting from the lack of resources on a node.

Unreachable heartbeat ack threshold. A reachable node becomes unreachable from a Cluster Communications heartbeating perspective if "Unreachable heartbeat ack threshold" heartbeat message ACKs (or less) are received for the last "Unreachable heartbeat threshold" heartbeat messages sent to a node. For the default case, a node becomes unreachable if one or less of the last four heartbeats sent to the marked reachable node are acknowledged.

Unreachable heartbeat threshold. See Unreachable heartbeat ack threshold field description.

Error Messages

Messages that are delivered through the error code parameter are listed here.

Message ID

Error Message Text

CPF3C1E E

Required parameter &1 omitted.

CPF3C21 E

Format name &1 is not valid.

CPF3C24 E

Length of the receiver variable is not valid.

CPF3CF1 E

Error code parameter not valid.

CPF3CF2 E

Error(s) occurred during running of &1 API.

CPF9872 E

Program or service program &1 in library &2 ended. Reason code &3.

CPFBB02 E

Cluster &1 does not exist.

CPFBB70 E

API request &1 not compatible with current cluster version.

API introduced: V5R1

[ Back to top | Cluster APIs | APIs by category ]