Resolving Bluemix application push failures – application startup errors

6 min read

Resolving Bluemix application push failures – application startup errors

This is the last post of this series on application push errors. The first post talked about client and fabric related errors, and the second talked about application staging time errors. If the application gets through those errors, the final step is to actually get started and running. If that fails, you will typically see error messages like below:

-----&gt; Uploading droplet (14M)<p></p>
<p>0 of 1 instances running, 1 starting<br>
0 of 1 instances running, 1 starting<br>
0 of 1 instances running, 1 down<br>
0 of 1 instances running, 1 down<br>
0 of 1 instances running, 1 down<br>
0 of 1 instances running, 1 starting<br>
0 of 1 instances running, 1 starting<br>
0 of 1 instances running, 1 down<br>
0 of 1 instances running, 1 down<br>
0 of 1 instances running, 1 starting<br>
0 of 1 instances running, 1 down<br>
Start app timeout<br>

Sometimes it may say “Start unsuccessful” instead of timeout. Yet the common issue here is that Bluemix thinks the application failed to start. Before I iterate through the possible causes, it’s important to gather as much information as possible for diagnosis. Again, the cf logs command is your friend, and you can run it to tail the logs in a separate console window. Sometimes it will be useful to turn on the runtime or your application’s trace if that’s supported. For example, for Liberty, you may change its trace specification in the server.xml and push a server directory or package to include it, which can give you lots of tracing data from certain Liberty components that you are interested in.

Now let’s look at the common causes.

Taking too long to start

When this happens, you may see below message in the log:

2015-04-29T12:35:49.43-0400 [STG/27]     OUT -----&gt; Uploading droplet (14M)<br>
2015-04-29T12:35:54.37-0400 [DEA/27]     OUT Starting app instance (index 0) with guid ceb4f93b-6306-4842-8637-1d1731412bdc<br>
2015-04-29T12:37:06.75-0400 [DEA/27]     ERR Instance (index 0) failed to start accepting connections<br>
2015-04-29T12:37:06.76-0400 [API/8]      OUT App instance exited with guid ceb4f93b-6306-4842-8637-1d1731412bdc payload: {"cc_partition"=&gt;"default", "droplet"=&gt;<br>
"ceb4f93b-6306-4842-8637-1d1731412bdc", "version"=&gt;"d237ca74-f30a-41fc-afd8-fe8f66152698", "instance"=&gt;"b7e9b891ddd7474f828412bd1d7bb329", "index"=&gt;0, "reason"=<br>
>"CRASHED", "exit_status"=&gt;-1, "exit_description"=&gt;"<span style="color: #ff0000;">failed to accept connections within health check timeout</span>", "crash_timestamp"=&gt;1430325426}<br>
2015-04-29T12:37:07.00-0400 [App/0]      ERR<br>

The default timeout for application startup in Bluemix is 60 seconds. You can increase it to a limit of 180 seconds by specifying the -t option when you push the application. Usually if an application with a route mapped does not listen on the given port (specified in the PORT environment variable) within the given timeout setting, it will get killed. Below are some possible reasons of long startup:

  • Too much initialization during startup The application may do lots of time-consuming work during startup, for example, loading lots of data from the disk or database. There are two solutions to this. One is to refactor the application so that it can do lazy initialization and/or asynchronous initialization. The other one is to start the application using the --no-route option, then do map-route when the initialization is done.

  • Listening on the wrong port Sometimes it may be simply because the application is not listening on the designated port. Check the code or the runtime configuration to make sure it’s listening on $PORT.

  • Reaching out to external network but timeout If the application need to reach out to external network resource during startup, check the connectivity. Security Group configurations may kick in here too, so make sure it’s not blocking the required network connections.

Consuming too much memory

The application gets killed if its container uses more memory than allocated. By default, a Bluemix application gets 1G bytes memory for each of its instance containers. Note that’s the total memory used by the container, not just the application process. If your application need more than 1G, you can use the -moption to specify a larger value. Sometimes the application may have a memory leakage that needs to be fixed.

Consuming too much disk

The application fails to write to disk if its containers consume more disk space than allocated, which may cause the application to malfunction. The default disk size of the container is 1G bytes in Bluemix. You can increase it to a maximum of 2G bytes using the -k push option. It’s usually an anti-pattern to use the container’s local storage for persistence. A Bluemix persistence service such as Cloudant and SQLDB should be used instead.

Pre-mature application exit

A Bluemix web application usually should not exit by itself. If it does, Bluemix will kill the container and then spin another one. Check your application startup logic to make sure it’s not exiting pre-maturely during startup. For example, is it missing a required service binding? Sometimes if the log is not providing enough information, it’s very challenging to diagnose such issues and you end up seeing the application continually starting and then being killed without much clue, because the application container does not stay and you cannot issue commands like cf files to examine it. There are several ways to keep the container alive in such a situation:

  • Leveraging runtime exit hook Some runtimes like IBM JRE used by the Liberty buildpack in Bluemix have an exit hook that can be used to keep the runtime process alive in the event of failures. This prevents the application container from being killed. To use this hook with IBM JRE, you can issue cf set-env <app-name> JVM_ARGS -Xdump:tool:events=vmstop,exec="sleep 1d" and then do cf restart <app-name>. This tells IBM JRE to execute the command sleep 1d when the JVM exits. You can change this command to any other valid shell command, including issuing curl to upload some data to an external service. To make this hook even more useful, you can add more JVM options like -Xdump:heap+java:events=vmstop, which will trigger a heap dump at the same time.

  • Modifying application startup command This trick can work with any buildpacks/runtimes by simply appending the sleep command to the application’s start command. You can find a buildpack’s default start command from a successful application push output, like below:

    -----&gt; Uploading droplet (161M)
    <p>0 of 1 instances running, 1 starting<br>
    0 of 1 instances running, 1 starting<br>
    0 of 1 instances running, 1 starting<br>
    0 of 1 instances running, 1 starting<br>
    0 of 1 instances running, 1 starting<br>
    1 of 1 instances running</p>
    <p>App started</p>
    <p><span style="color: #3366ff;">App test was started using this command `.liberty/initial_startup.rb`</span></p>
    <p>Showing health and status for app test in org test / space test as test...<br>
    <p>requested state: started<br>
    instances: 1/1<br>
    usage: 1G x 1 instances<br>
    last uploaded: Tue Jun 16 16:01:56 UTC 2015<br>
    stack: lucid64</p>
    <p>     state     since                    cpu    memory         disk           details<br>
    #0   running   2015-06-16 12:04:16 PM   1.8%   257.5M of 1G   257.3M of 1G</p>


    With the above example, you can then issue below push command:

    cf push &lt;app-name&gt;<app-name> -c ".liberty/initial_startup.rb;sleep 1d" --no-route</app-name>

    What this command does is to sleep for one day after the runtime process exits.

  • Using the application management utility of Bluemix buildpacks If you are using the IBM node.js or Liberty buildpack in Bluemix, then there is a very convenient way for you to diagnose such application crash issue. You can simply do cf set-env <app-name> BLUEMIX_APP_MGMT_ENABLE devconsole+shell and then cf restart <app-name>. You will find that even though the application process may still exit, the container stays. You can restart the application process in a web console, or even get shell access into the container. This gives you full power to dig into the issue, as if you are working with your own desktop environment. For more information, refer to the recent blogs introducing this feature for both node.js and Liberty buildpacks.

This post concludes the series. We covered all kinds of errors that you may encounter during an application push, including client errors, fabric errors, application staging errors and application startup errors. I presented the same topic in the last Cloud Foundry Summit. You can watch my talk on YouTube, or download the slide I used from SlideShare. Hope you like it!

Be the first to hear about news, product updates, and innovation from IBM Cloud