(Optional) Autoscaling

In order to handle requests from a user, RSE API dedicates a user session that consists of base persistent threads and the threads or processes required to handle a specific request. The session consumes thread count, memory, and other z/OS® system resources.

To support a high load of user sessions and requests, RSE API uses an autoscaling mechanism that operates on a chain of a primary server and one or more overflow servers. When resource usage at the current server reached its threshold, a request is forwarded. The requests are dynamically distributed among the servers based on their current user active session count, thread count, and memory usage.

Note: The same effect can be achieved by having stand-alone servers, which users can follow a central management scheme, such as static grouping users per server, to place their requests to a designated server. RSE API overflow autoscaling helps relieve the extra requirement for such a management component.

The listening port of the primary server is the main reception for all incoming requests from users. When its resource usage thresholds are reached, the primary server forwards the request to the next direct overflow server or to an appropriate overflow server based on an overflow port map that is dynamically constructed based on the requests handling history of the chain.

You must start the primary server in advance to provide the service to users. The overflow servers can be started manually in advance, or automatically during requests handling when the need for an overflow server is detected. The automatic startup of an overflow server is done by RSE API invoking the startup command script. A server only starts up and monitors its own direct overflow working status. In this release, you must shut down all overflow servers manually.

Depending on the system resource configuration, a host can run multiple overflow chains and each chain is configured and operates independently.

Note: RSE API instances can be deployed in multiple chains when integrating with load balancers such as WLM, where each of the primary servers shares a common WLM port, and each has its own overflow server chain.