Expel protocol

The expel protocol allows a provider to propose the removal from the group of one or more providers.

Some situations in which this could be useful include:
  • A provider has received an announcement notification that another provider is not responsive or has detected an internal error.
  • A provider has received an announcement notification that another provider failed to submit a vote during a previously completed n-phase protocol within the specified time limit.
  • A provider has detected through some other means that another provider is not behaving as expected in the context of the application that the group is running.

During the invocation of the expel protocol, Group Services runs a deactivate script against each provider that is being expelled. The deactivate script, which is specified by each Group Services client on initialization, is used to perform any cleanup actions that may be required.

The deactivate script does not need to be a shell script but can be any kind of executable file. For each provider that is targeted for expulsion, the Group Services daemon forks a child process that attempts to invoke the deactivate script on the provider's node.

The expel protocol is a provider-initiated protocol. Therefore, if it collides with another already-running protocol, Group Services returns it to the proposer. The proposer must resubmit the protocol; the protocol is not automatically queued.

A provider uses the ha_gs_expel subroutine to request an expel protocol. On input, the provider specifies the following information:
  • The number of phases for the protocol. An expel protocol may be either a one-phase or an n-phase protocol.
  • The voting time limit for each phase. Providers that are not being expelled must vote within this time limit. For providers that are being expelled, the deactivate script must complete within this time limit, or be considered unsuccessful.
  • The list of providers to be expelled. These providers do not take part in the protocol and receive no notice of it, unless it is approved. All providers that are not targeted for expulsion take part in running the protocol, even if they had been declared nonresponsive before the protocol began.
  • A deactivate phase specifier. This value tells Group Services in which voting phase it should invoke the deactivate script. A value of 0 indicates that the deactivate script should not be invoked.
  • An expel flag. This flag is passed to the deactivate script. A null value indicates that no flag should be passed to the deactivate script.

For each provider that is targeted for expulsion, Group Services runs the deactivate script that was specified by that provider when it initialized itself with Group Services. The deactivate script runs on the node on which the provider that is targeted for expulsion is running. It runs during the phase and uses the flag that was specified on the expel protocol. To be successful, the deactivate script must complete within the voting time limit for the phase. To invoke the deactivate script, Group Services acts as a substitute for each provider that is being expelled.

During the expel protocol, providers that are not being expelled treat this as a normal protocol and take any action they deem appropriate. If it is an n-phase protocol, their voting responses are tallied as if it were any other n-phase protocol.

If the value of the deactivate phase specifier is 0, no deactivate script is invoked during the protocol. If the protocol is approved, the providers that are targeted for expulsion are removed from the group. Because one-phase protocols are always approved, a one-phase expel protocol with a deactivate phase specifier of 0 simply removes the targeted providers from the group. If the protocol is rejected, the targeted providers are not removed from the group.

At the start of the voting phase given by a non-zero deactivate phase specifier, Group Services runs the deactivate script against each targeted provider. If at least one provider votes to reject the protocol before this phase, the targeted providers are not removed from the group and no deactivate scripts are invoked.

If the expel protocol is a one-phase protocol, and the value of the deactivate phase specifier is 1, the deactivate script is run immediately after the protocol begins running. Providers that are not targeted for expulsion receive the usual protocol approval notification, informing them that the targeted providers are now out of the group. Providers that are targeted for expulsion receive the protocol approval notification after the Group Services daemon has forked a child process to run the deactivate script. The Group Services daemon does not wait for the script to complete before it sends the notification. Therefore, it difficult to determine whether the provider will receive the notification before or after the script runs.

The exit code of the deactivate script is not inspected, and the result is not returned to the providers that remain in the group.

If a provider that is targeted for expulsion by a one-phase expel protocol fails after the protocol has begun, no failure protocol is initiated in the group for that provider.

When a deactivate script runs successfully, it is expected to exit with an exit code of 0. Group Services treats the successful completion of the deactivate script as a vote to approve the protocol. If the protocol requires more voting phases, Group Services continues to vote APPROVE for each subsequent voting phase.

When a deactivate script does not exit with a code of 0, group services enters the group's current default vote value as the provider's vote for the phase. If the protocol requires more voting phases, group services continues to enter the current default vote value as the provider's vote for each subsequent voting phase.

If the deactivate script is to be run in a future voting phase, Group Services enters a vote of CONTINUE as the provider's vote for each interim voting phase.

If one or more providers that are targeted for expulsion did not specify a deactivate script, or specified a script that could not be run, but a non-zero deactivate phase specifier was given, then for those providers, the group's default vote value is entered for this and each subsequent voting phase. However, for providers that did specify a valid deactivate script, the script is run and its result is used to drive the voting, as previously described.

When a provider fails after the expel protocol begins but before the Group Services daemon has forked a child process to run the deactivate script, Group Services passes a process ID of 0 to the deactivate script. The deactivate script is still run and the exit code is used to determine the vote for this provider, as previously described.

Group Services tallies the votes for voting phases in the normal manner. If the expel protocol is approved, the providers that are targeted for expulsion are removed from the group. Remaining providers and subscribers are notified.

Group Services sends the protocol approval notification to expelled providers that did not exit in the course of running the deactivate script. However, Group Services does not verify that such providers receive or process the notification. Because they are no longer in the group, expelled providers cannot submit protocols and do not receive notifications related to the group.

In the event that the protocol is rejected for any reason, the providers that are targeted for expulsion are not removed from the group. However, if the deactivate script causes a provider to exit, Group Services initiates a failure leave protocol for that provider.

When a single process is joined as providers to multiple groups, and one of those provider instances has been expelled from a group, the effect on the other instances is as follows:
  • If the process no longer exists (it is killed or has failed) as a result of the expel protocol, the other provider instances of the process are handled through failure leave protocols in their groups.
  • If the process still exists, the other provider instances of the process are not affected and continue as full participants in their groups.

If a single process is joined as providers to multiple groups, and more than one of the groups are simultaneously running expel protocols that target those providers (because the process is unresponsive, for example), the order in which deactivate scripts are run against the process is not defined by Group Services. Because each group's expel protocol proceeds independently, Group Services does not coordinate the invocation of the deactivate script for each group's protocol. If all groups approve their expel protocols and the process is killed, no failure leave protocols are invoked. If one or more groups reject their expel protocols, but the process is killed in the course of running the deactivate script, those groups initiate failure leave protocols to remove the failed provider.