Cloud Foundry deployment fails on a specific virtual machine job

Cloud Foundry deployment fails on a specific virtual machine job.

Symptoms

When you deploy a Cloud Foundry application, an error similar to the following message is displayed:

Started updating job debian_nfs_server > debian_nfs_server/0 (c9a7c3c2-1d3b-41d2-a14c-f3c4f1ec742d) (canary)..... Done (00:01:13)
 Started updating job consul > consul/0 (7ade5ffb-a450-46e3-b94a-c1f8be310ef7) (canary)..... Done (00:01:08)
 Started updating job nats > nats/0 (76fe6c8e-1d21-4422-a407-8f79a78e03b7) (canary)...... Done (00:01:15)
 Started updating job ccdb_ng > ccdb_ng/0 (d4b559cb-b1e8-4b20-87de-37182e2b30ec) (canary)..... Done (00:01:10)
 Started updating job uaadb > uaadb/0 (e68e8783-049b-4f10-95c4-e922d2eccff6) (canary)..... Done (00:01:11)
 Started updating job router > router/0 (20d6e777-2ffe-42f7-8fbd-8b1675c5fa3c) (canary).... Done (00:00:47)
 Started updating job dea_next > dea_next/0 (75dae58e-de27-4f61-b396-1618fcc1ac04) (canary)..... Done (00:01:05)  Started updating job cc_core > cc_core/0 (87de372a-ac40-471c-b621-b4a6ac3ca602) (canary)........................................ Failed: 'cc_core/0 (87de372a-ac40-471c-b621-b4a6ac3ca602)' is not running after update. Review logs for failed jobs: cloud_controller_ng, cloud_controller_worker_local_1, cloud_controller_worker_local_2, nginx_cc, cloud_controller_migration, cloud_controller_worker_1, cloud_controller_clock (00:08:48)
Error 400007: 'cc_core/0 (87de372a-ac40-471c-b621-b4a6ac3ca602)' is not running after update. Review logs for failed jobs: cloud_controller_ng, cloud_controller_worker_local_1, cloud_controller_worker_local_2, nginx_cc, cloud_controller_migration, cloud_controller_worker_1, cloud_controller_clock
Task 9 error

In this message, the failed job is named cloud_controller_ng.

Resolving the problem

  1. Log in to the BOSH director with the administrative credentials.
  2. Ensure that your BOSH deployment is set. If it is not set, install the BOSH CLI.
  3. Determine the ID of your cc_core. Run the following command:

    bosh -e IBMCloudPrivate -d Bluemix vms
    
  4. To determine which service failed, run the following commands:

    bosh -e IBMCloudPrivate -d Bluemix ssh cc_core/<uuid>
    sudo su
    monit summary
    

    The output resembles the following text

    The Monit daemon 5.2.5 uptime: 27m
    
    Process 'unbound'                   running
    Process 'consul_agent'              running
    Process 'cloud_controller_ng'       Execution failed
    Process 'cloud_controller_worker_local_1' initializing
    Process 'cloud_controller_worker_local_2' initializing
    Process 'nginx_cc'                  initializing
    Process 'cloud_controller_migration' Does not exist
    Process 'cloud_controller_worker_1' Does not exist
    File 'nfs_mounter'                  accessible
    Process 'cloud_controller_clock'    Does not exist
    Process 'statsd-injector'           running
    Process 'route_registrar'           running
    Process 'loginserver'               running
    Process 'uaa'                       running
    Process 'mod_vms'                   running
    Process 'metron_agent'              running
    System 'system_localhost'           running
    

    In this example, the cloud_controller_ng process failed.

  5. Locate the directory for the failed process:

    cd /var/vcap/sys/log/<failed_process> ; ls
    

    Where <failed_process> is the name of the process that failed. The output resembles the following code:

    -rw-r--r--  1 root root 20250 Aug 22 17:30 cloud_controller_ng_ctl.err.log
    -rw-r--r--  1 root root 24159 Aug 22 17:30 cloud_controller_ng_ctl.log
    -rw-r--r--  1 root root 33988 Aug 22 17:30 cloud_controller_worker_ctl.err.log
    -rw-r--r--  1 root root 21678 Aug 22 17:30 cloud_controller_worker_ctl.log
    -rw-r-----  1 root root   506 Aug 22 17:00 pre-start.stderr.log
    -rw-r-----  1 root root     0 Aug 22 17:00 pre-start.stdout.log
    
  6. Review each log in this directory until you find the error.

  7. Correct the error.