Preparing an application to run on the cloud is becoming a common task. How difficult a task that is varies widely depending upon how your application is written. A common distinction that has emerged over the last few years is between applications that are "cloud-ready" versus "cloud-centric" (sometimes called "born on the cloud").
Essentially, an application is cloud-ready if it can be effectively deployed into either a public or private cloud. This means that the application needs to be designed in such a way that it can take advantage of the capabilities provided by the PaaS (Platform as a Service) layer on which it runs. Likewise, the application should not break because of design limitations that collide with assumptions made in the PaaS layer.
It is for this reason that many developers push toward replacing traditional applications with entirely new applications that are built to be cloud-centric. These applications are often built using different tools and runtimes than traditional applications. For example, if an application is being entirely redeveloped for the cloud, it might replace a relational database with a NoSQL database, like Cloudant or MongoDB.
However, you don't have to go so far as to abandon your entire existing tool and runtime suites. If you follow some simple rules in your application design, you can usually make your existing applications cloud-ready without having to go through an entire re-implementation. You can use these same rules as criteria for ranking your existing applications for migration to a dynamic cloud environment.
Here are nine rules for making your application cloud-ready:
1. Don't code your application directly to a specific topology
A key benefit of many cloud platform services is that they allow for immediate scalability changes in the application. This might be through true dynamic scalability, such as with the virtual applications in IBM PureApplication System, or by manually resizing the number of instances of an application – adding dynos in Heroku, or adding Warden containers in Cloud Foundry. The principle to remember is that if your topology can change, it will change. This is a radical shift! In a traditional environment, the application might assume a particular deployment topology (for example, a two-node WebSphere "golden topology") and might even embed assumptions for host names and host IP addresses. None of these assumptions are workable in a cloud application – hostnames, IP addresses, and the number of application nodes in use can all change at a moment's notice. Assumptions about where "singletons" reside in your topology can be especially problematic. If all the other nodes try to reach out to that particular node and it's not there - or even worse, if there are two of them - what happens to your application?
What to do instead:
The first rule must be to keep your application from being affected by dynamic scaling: Build your application to be as generic and stateless as possible. If you must use a singleton, then make sure that you have a voting protocol enabled to enable the remaining nodes to recreate a singleton if the singleton dies, and keep a permanent backup of the singleton's state in a shared repository, like a database.
2. Do not assume the local file system is permanent
Since a node can be moved, taken away, or duplicated at any time, you can't make any assumptions about the longevity of files written to the file system. Suppose an application uses the local file system as a cache of frequently accessed information. If the node is shut down and then restarted at a different location in a different VM, then that cache will disappear – leading to very different response times from different nodes in your topology.
What to do instead:
Instead of using the local file system as a store for temporary information, put temporary information in a remote store such as an SQL or NoSQL database. Be aware that reading static information from a file system is fine; for example, your application can read a configuration or properties file as long as each node has the same files in the same or an equivalent directory structure. It's writing unique files to the file system that gets you into trouble.
3. Don't keep session state in your application
Statefulness of any sort limits the scalability of an application - not just storing state on the local file system, but even storing permanent state in local memory. Unless the application can recover seamlessly from the removal of any node, and rebalance work instantaneously on the addition of a node, then the application will have a hard time functioning in a cloud environment.
For many applications, the hardest type of state to eliminate is session state. It's so hard to eliminate that trying to do so entirely is often a fool's errand. While it might be possible to store some state in the client browser in modern web applications (using the facilities in HTML5, for example), it's usually better to minimize the impact of that state by storing it in a centralized location, on the server. But you need to be careful in implementing that recommendation. In Java applications, HTTPSession state is often stored in-memory, which presents a problem if your entire application server can be added or removed at any time. Ruby on Rails uses a similar mechanism with its session hash, and the same issues apply.
What to do instead:
If you can't eliminate session state entirely, the best practice is to push it out to a highly available store that is external to your application server; that is, put it in a distributed caching store like IBM WebSphere Extreme Scale, Redis, or Memcached, or in an external database (either a traditional SQL database, or a NoSQL database).
4. Don't log to the file system
If you write your logs to the local file system, then what happens in a crash that is so serious that it takes out the entire container or VM where your application was running? Or what if your PaaS layer decides to scale down your application and remove the VM or container entirely? In both cases, you've just lost valuable information for debugging problems, especially problems that began long before the first symptom is seen by the user.
What to do instead:
As a result of this issue, many PaaS layers such as Heroku, Cloud Foundry and PureApplication System add log aggregators that can be redirected remotely. Or you might prefer to use an open source aggregator like Scribe or Apache Flume, or a commercial product like Splunk. In any case, in a dynamic cloud environment it's critical to have your logs available on a service that outlives the nodes that the logs were generated on. In this case, one thing you should be aware of is the destinations of your logs when you are performing logging. Most log frameworks have different log levels that enable you to customize how much information is logged - if you know that your log information is going to be directed across the network, you might want to minimize the overhead of that traffic by reducing the log level to produce a manageable volume.
5. Don't assume any specific infrastructure dependency
This is a general principle that has several manifestations. For example, you should not assume that the services your application calls are located at particular hostnames or IP addresses. Service-oriented architectures have been widely adopted in recent years, but it is still common to find applications that embed the details of the service endpoints they call. When those called peers or services can be relocated or regenerated within your cloud environment - and thereby shift to new host names and IP addresses - the calling applications' code breaks.
What to do instead:
Abstracting environment-specific dependencies into a set of property files is an improvement, but this is still inadequate. The problem with using files as a name is that you are constantly updating and changing properties files. Since applications need to be more resilient in a cloud environment they should be agnostic to clustering. A better approach is to consult an external service registry to resolve service endpoints, or delegate the entire routing function to a service bus or a load balancer with a virtual name.
6. Don't use infrastructure APIs from within your application
This is a rule with wide applicability because an "infrastructure API" can refer to a lot of different layers in your software stack. For example, many Java developers still create their own threads and manage their own thread pools – even though concepts like the WorkManager API have been around for many years. The key advantage to avoiding a low-level infrastructural API is realized when the time comes to monitor your application – existing monitoring tools will know about managed thread pools, but if you've created your own then the cloud's monitoring tools will be unable to aid you in discovering thread bottlenecks.
At the configuration level, ISVs are often inclined to make their applications as self-contained as possible. If they know that an application needs the TCP connection timeout at a low value, the app or its launcher script may verify or set this network option. A better practice now is for the application to delegate these requirements to the scripting that prepares the cloud environment for the application. The point here is to limit the range of APIs used in the application code, and shift responsibility for infrastructure services to the provider so that layers of the infrastructure – that is, the operating system image – can be updated without impact to the application.
In the management space, we've seen developers build application code that queries and manipulates the IBM WebSphere Application Server infrastructure through JMX APIs. This is great provided you precisely control your infrastructure. But suppose that as part of your cloud migration you move to a lightweight application server like the WebSphere Liberty profile. There you'd have a different set of MBeans with different capabilities. That becomes another part of your code that has to change. As we look to cloud APIs such as OpenStack, the opportunity presents itself to try even more exotic options like building your own autoscaling through manipulating the OpenStack Nova APIs.
What to do instead:
This is possibly the most difficult of potential problems to remedy. Once you start making assumptions about the infrastructure that your application runs on, it makes changing that infrastructure more challenging. So think about why your application code is calling an infrastructure service or API – is this something that could move to the PaaS layer? In the JMX situation above, the code queried the JMX APIs to provide a dashboard for application performance – something that a vendor solution like ITCAM could do more easily and portably. So the question to ask is: are there existing open source or commercial products that you can rely on instead? Your application should be concerned with solving the business problem it's aimed at, and not with manipulating the infrastructure it runs on. Leave PaaS solutions in the PaaS layer and keep them out of your application code.
7. Don't use obscure protocols
There are so many interesting protocols out there, and interesting packages built on top of them. The trouble is, they often take special configuration and tuning for resiliency – and resiliency is something you really need in the cloud if you are going to add and remove nodes under load. Why build in your own database connection model if the platform can provide it? Applications based on HTTP, SSL, and standard database, queuing, and web service connections are going to be more resilient in the long term, by delegating the configuration repertoire to the platform.
What to do instead:
If your application is using any older or non-standard protocols, now is the time to take this “silver lining” opportunity to modernize and standardize. For example, EJBs using IIOP were cool at the turn of the millennium, but the world has since moved on. Moving to an HTTP-based infrastructure based on standards like REST (or even the older SOAP and WS-* standards) will make it easier to port your system to a new environment, and will also enable additional business opportunities provided by API management. Finally, you might want to consider that asynchronous protocols (such as IBM MQ, or newer options like MQTT) are still alive and well and can be extremely effective for many styles of application programming. Rather than trying to make HTTP into something it isn't (like a reliable messaging system) make sure to take a minimalist approach and apply the right tool for the job.
8. Don't rely on OS-specific features
It will come as no surprise that applications that use standards-based services and APIs are more portable to cloud environments than those that rely on specific operating system features. We often see a tendency to use OS-specific features when a higher-level, OS-neutral version is available. A simple example is for scheduling work to be done. Many application servers such as WebSphere Application Server build in scheduling services directly into their APIs. Open source options like Quartz are also readily available. However, many developers still invoke Java programs from OS-level schedulers like cron. This works fine if your application is running on Linux or another UNIX derivative, but if you move to Windows, then you are out of luck. The principle works the other way, too: why assume that the Windows Event Service is available to your application, and preclude running in a Linux cloud?
What to do instead:
In some cases you can remediate this by using compatibility libraries that make one operating system "look" like another. Cygwin is a good example of a compatibility library that provides a set of Linux tools in a Windows environment, while Mono is a good example of a compatibility library going the other way to provide .NET capabilities in Linux. However, the best practice is to avoid the OS-specific dependencies as much as you can and rely instead on services provided by your middleware infrastructure or your service providers.
9. Don't manually install your application
Cloud environments are quite likely to be created and destroyed more frequently than traditional environments. Your application will need to be installed frequently and on-demand. It follows that the installation process must be scripted and completely reliable, with configuration data externalized from the scripts. There are some ramifications with this: don't assume a user is present to accept a license agreement, and don't assume that a user will be available to choose between 1 of N different configuration options.
What to do instead:
At a minimum, capture your application installation as a set of operating-system-level scripts. If your middleware platform provides a built-in scripting mechanism (such as the Jython scripts available for WebSphere Application Server) then by all means take advantage of that. Keeping your application installation small and portable makes it easier for you to adapt to different automation techniques such as Chef, Puppet, or patterns in PureApplication System.
Ideally, you should also minimize the dependencies required by the application installation. For example, what is your minimum configuration? Does the database really need to be available to install the application? Or would a better option be for the application to be able to start without its database, report the problem, and then increase function when the database becomes available?
These few simple rules will help you determine what it will take to get your applications ready for the cloud. If you're building an application that is "born on the cloud" then you should take these rules to heart and incorporate them directly into your application. If you're getting ready to move your application onto a cloud environment for the first time, then taking the time to think about these rules and making the critical adjustments is a key first step along that road.
Many thanks to Bobby Woolf for his patient review and editing of this article, and to James Kochuba for his helpful suggestions.
- Developing cloud-capable applications, Open DataCenter Alliance, Nov. 2012.
- Building cloud-ready, multicore-friendly applications, Part 1: Design principles, Guerry Semones, JavaWorld, Mar-Apr 2009.
- Building cloud-ready, multicore-friendly applications, Part 2: Mechanics of the cloud, Guerry Semones, JavaWorld, Mar-Apr 2009.
- Is your application ready to become Virtual? Kyle Brown, IBM developerWorks, April 2012.
- IBM developerWorks WebSphere