Shutting down custom system services gracefully

If you are running a custom system service as an EGO service, you can specify a script to clean up and shut down service instances. If you do not have a shutdown script, you can simply enable service instances to be shut down. In both cases, you can configure a timeout during which the system waits for the target instance to exit. The system ends the target instance if it is still running after the timeout expires.

About this task

When the EGO service controller (EGOSC) wants to shut down a service, it starts the job control command on the same host with the same initial environment as the container to be terminated. If a timeout is defined, the service controller waits for the service to shut down within the duration of the "grace period". After the grace period has passed, if the instance container is still alive, SIGKILL is sent to terminate the container. The job controller process is also killed when SIGKILL is sent.

Procedure

  1. From the cluster management console, add and configure the following parameters in the service profile under ServiceDefinition > ActivityDescription > ActivitySpecification:
    • JobController: Specify the path to a script for cleanup and shutdown operations. If the job controller fails, EGO forcibly terminates the service instance.

      If you do not have a script, specify gracefulshutdown within the JobController parameter to enable service instances to be shut down. Use gracefulshutdown to simply send a SIGTERM to the processes in the container. With this configuration, the container to be terminated is marked for graceful shutdown when it is started. After the grace period (specified in ControlWaitPeriod) has passed, if the instance container is still alive, SIGKILL is sent to terminate the container.

    • ControlWaitPeriod: (Optional) Define the grace period in the format PTnHnMnS, which means n hours, n minutes, and n seconds. For example, PT10M0s means 10 minutes, and PT60s means 60 seconds. The range is 0~1hour. If the setting is out of this range, the service will not be loaded by EGOSC. The default value is 2 minutes.

    For general information on adding parameters, see Updating a service.

  2. Check if your service can be stopped after the grace period instead of being killed immediately. The grace period may have a delay of 5+ seconds.

    Check the EGOSC log under $EGO_ESRVDIR/esc/log for a message similar to the following:

    2009-04-01 09:18:46.000 CST WARN [13769] do_containerStateChange(): on host <bjg270-01>, the container <3> belongs to instance <1> of service <plc> terminated, reason <Terminated by job controller>, status <1>

What to do next

If you need to troubleshoot, use the following tips:

Job Controller did not kill the service:

Check the EGOSC log under $EGO_ESRVDIR/esc/log. You should see:

2009-04-01 09:17:17.000 CST WARN [13769] do_containerStateChange(): on host <bjg270-01>, the container <9> belongs to instance <1> of service <test> terminated, reason <Terminated by SIGKILL, job controller does not exist or failed>, status <0>

The service cannot be loaded by EGOSC:
  • If only ControlWaitPeriod was added to the service profile, you would see the following messages in the EGOSC log:

    2009-04-01 11:49:31.000 CST ERROR [8946] validContainerSpec(): Conflict parameters, controlWaitPeriod is defined but JobController is not defined, refused2009-04-01 11:49:31.000 CST ERROR [8946] loadServiceDefinition(): parse section ServiceDefinition failed

    2009-04-01 11:49:31.000 CST ERROR [8946] loadServiceDefinition():parse service definition file /opt/ego/eservice/esc/conf/services/test.xml failed

    2009-04-01 11:49:31.000 CST ERROR [8946] loadServices(): failed to load service definition from </opt/ego/eservice/esc/conf/services/test.xml>

    Add the JobController parameter to the service profile.

  • If ControlWaitPeriod in the service profile is less than 0 or greater than 1 hour, you would see the following messages in the EGOSC log.

    2009-04-01 12:25:30.000 CST ERROR [10321] validContainerSpec(): Invalid controlWaitPeriod, refused

    2009-04-01 12:25:30.000 CST ERROR [10321] loadServiceDefinition(): parse section ServiceDefinition failed

    2009-04-01 12:25:30.000 CST ERROR [10321] loadServiceDefinition():parse service definition file /opt/ego/eservice/esc/conf/services/test.xml failed

    2009-04-01 12:25:30.000 CST ERROR [10321] loadServices(): failed to load service definition from </opt/ego/eservice/esc/conf/services/test.xml>

    Set a value for ControlWaitPeriod in the 0~1hour range.

The service is terminated after 2 minutes:
  • If ControlWaitPeriod is defined as PT0H0M0S, PT0M0S, or PT0S, EGO will set the value of ControlWaitPeriod to 2 minutes, and at the same time remove ControlWaitPeriod from the service profile. Define ControlWaitPeriod again and set the value to be greater than 2 minutes.
  • If only JobController is defined, the default value for ControlWaitPeriod is 2 minutes. Change the value for ControlWaitPeriod.