Resolving Bluemix application push failures – application startup errors

Share this post:

This is the last post of this series on application push errors. The first post talked about client and fabric related errors, and the second talked about application staging time errors. If the application gets through those errors, the final step is to actually get started and running. If that fails, you will typically see error messages like below:

-----> Uploading droplet (14M)

0 of 1 instances running, 1 starting
0 of 1 instances running, 1 starting
0 of 1 instances running, 1 down
0 of 1 instances running, 1 down
0 of 1 instances running, 1 down
0 of 1 instances running, 1 starting
0 of 1 instances running, 1 starting
0 of 1 instances running, 1 down
0 of 1 instances running, 1 down
0 of 1 instances running, 1 starting
0 of 1 instances running, 1 down
Start app timeout

Sometimes it may say “Start unsuccessful” instead of timeout. Yet the common issue here is that Bluemix thinks the application failed to start. Before I iterate through the possible causes, it’s important to gather as much information as possible for diagnosis. Again, the cf logs command is your friend, and you can run it to tail the logs in a separate console window. Sometimes it will be useful to turn on the runtime or your application’s trace if that’s supported. For example, for Liberty, you may change its trace specification in the server.xml and push a server directory or package to include it, which can give you lots of tracing data from certain Liberty components that you are interested in.

Now let’s look at the common causes.

Taking too long to start

When this happens, you may see below message in the log:

2015-04-29T12:35:49.43-0400 [STG/27]     OUT -----> Uploading droplet (14M)
2015-04-29T12:35:54.37-0400 [DEA/27] OUT Starting app instance (index 0) with guid ceb4f93b-6306-4842-8637-1d1731412bdc
2015-04-29T12:37:06.75-0400 [DEA/27] ERR Instance (index 0) failed to start accepting connections
2015-04-29T12:37:06.76-0400 [API/8] OUT App instance exited with guid ceb4f93b-6306-4842-8637-1d1731412bdc payload: {"cc_partition"=>"default", "droplet"=>
"ceb4f93b-6306-4842-8637-1d1731412bdc", "version"=>"d237ca74-f30a-41fc-afd8-fe8f66152698", "instance"=>"b7e9b891ddd7474f828412bd1d7bb329", "index"=>0, "reason"=
>"CRASHED", "exit_status"=&gt;-1, "exit_description"=&gt;"<span style="color: #ff0000;">failed to accept connections within health check timeout</span>", "crash_timestamp"=&gt;1430325426}
2015-04-29T12:37:07.00-0400 [App/0] ERR

The default timeout for application startup in Bluemix is 60 seconds. You can increase it to a limit of 180 seconds by specifying the -t option when you push the application. Usually if an application with a route mapped does not listen on the given port (specified in the PORT environment variable) within the given timeout setting, it will get killed. Below are some possible reasons of long startup:

  • Too much initialization during startup
    The application may do lots of time-consuming work during startup, for example, loading lots of data from the disk or database. There are two solutions to this. One is to refactor the application so that it can do lazy initialization and/or asynchronous initialization. The other one is to start the application using the --no-route option, then do map-route when the initialization is done.
  • Listening on the wrong port
    Sometimes it may be simply because the application is not listening on the designated port. Check the code or the runtime configuration to make sure it’s listening on $PORT.
  • Reaching out to external network but timeout
    If the application need to reach out to external network resource during startup, check the connectivity. Security Group configurations may kick in here too, so make sure it’s not blocking the required network connections.

Consuming too much memory

The application gets killed if its container uses more memory than allocated. By default, a Bluemix application gets 1G bytes memory for each of its instance containers. Note that’s the total memory used by the container, not just the application process. If your application need more than 1G, you can use the -m option to specify a larger value. Sometimes the application may have a memory leakage that needs to be fixed.

Consuming too much disk

The application fails to write to disk if its containers consume more disk space than allocated, which may cause the application to malfunction. The default disk size of the container is 1G bytes in Bluemix. You can increase it to a maximum of 2G bytes using the -k push option. It’s usually an anti-pattern to use the container’s local storage for persistence. A Bluemix persistence service such as Cloudant and SQLDB should be used instead.

Pre-mature application exit

A Bluemix web application usually should not exit by itself. If it does, Bluemix will kill the container and then spin another one. Check your application startup logic to make sure it’s not exiting pre-maturely during startup. For example, is it missing a required service binding? Sometimes if the log is not providing enough information, it’s very challenging to diagnose such issues and you end up seeing the application continually starting and then being killed without much clue, because the application container does not stay and you cannot issue commands like cf files to examine it. There are several ways to keep the container alive in such a situation:

  • Leveraging runtime exit hook
    Some runtimes like IBM JRE used by the Liberty buildpack in Bluemix have an exit hook that can be used to keep the runtime process alive in the event of failures. This prevents the application container from being killed. To use this hook with IBM JRE, you can issue cf set-env <app-name> JVM_ARGS -Xdump:tool:events=vmstop,exec="sleep 1d" and then do cf restart <app-name>. This tells IBM JRE to execute the command sleep 1d when the JVM exits. You can change this command to any other valid shell command, including issuing curl to upload some data to an external service. To make this hook even more useful, you can add more JVM options like -Xdump:heap+java:events=vmstop, which will trigger a heap dump at the same time.
  • Modifying application startup command
    This trick can work with any buildpacks/runtimes by simply appending the sleep command to the application’s start command. You can find a buildpack’s default start command from a successful application push output, like below:
    -----&gt; Uploading droplet (161M)

    0 of 1 instances running, 1 starting
    0 of 1 instances running, 1 starting
    0 of 1 instances running, 1 starting
    0 of 1 instances running, 1 starting
    0 of 1 instances running, 1 starting
    1 of 1 instances running

    App started


    <span style="color: #3366ff;">App test was started using this command `.liberty/initial_startup.rb`</span>

    Showing health and status for app test in org test / space test as test...

    requested state: started
    instances: 1/1
    usage: 1G x 1 instances
    last uploaded: Tue Jun 16 16:01:56 UTC 2015
    stack: lucid64

    state since cpu memory disk details
    #0 running 2015-06-16 12:04:16 PM 1.8% 257.5M of 1G 257.3M of 1G

    With the above example, you can then issue below push command:
    cf push &lt;app-name&gt;<app-name> -c ".liberty/initial_startup.rb;sleep 1d" --no-route</app-name>
    What this command does is to sleep for one day after the runtime process exits.
  • Using the application management utility of Bluemix buildpacks
    If you are using the IBM node.js or Liberty buildpack in Bluemix, then there is a very convenient way for you to diagnose such application crash issue. You can simply do cf set-env <app-name> BLUEMIX_APP_MGMT_ENABLE devconsole+shell and then cf restart <app-name>. You will find that even though the application process may still exit, the container stays. You can restart the application process in a web console, or even get shell access into the container. This gives you full power to dig into the issue, as if you are working with your own desktop environment. For more information, refer to the recent blogs introducing this feature for both node.js and Liberty buildpacks.

This post concludes the series. We covered all kinds of errors that you may encounter during an application push, including client errors, fabric errors, application staging errors and application startup errors. I presented the same topic in the last Cloud Foundry Summit. You can watch my talk on YouTube, or download the slide I used from SlideShare. Hope you like it!

More stories
April 19, 2019

Reach Out to the IBM Cloud Development Teams on Slack

Get the help you need fast—directly from the IBM Cloud Development Teams and other users on Slack.

Continue reading

April 19, 2019

Introducing IBM Cloud Object Storage Firewall: Further Secure Your Data

IBM Cloud Object Storage (COS) is giving you more control over who can access your data. We have introduced a new capability allowing you to configure your buckets with trusted IP address(es) that will dictate access to the data in COS.

Continue reading

April 18, 2019

Getting Started with IBM Cloud Databases for Elasticsearch and Kibana

In this article, we’ll show you how to use Docker to connect your Databases for Elasticsearch deployment to Kibana—the open source tool that lets you add visualization capabilities to your Elasticsearch database.

Continue reading