11 tips for scaling servers in the cloud

How gracefully can your apps scale to meet demand?

Our company recently starting building applications for clients with large user bases. This meant we needed to move away from a traditional server setup and start scaling servers to handle hundreds of requests at once. This, in turn, required me to learn about the differences between the traditional server setup and today's more modern approach.

Making the move to the cloud

Back in the "old" days (a whole couple of decades ago), adding more server space to your operation involved getting someone to install hardware and then having people maintain that hardware for you. Not only was there the cost of purchasing and maintaining the hardware, but there was also the cost of the physical space on your server. Given that each server had limited space, growing quickly could create problems.

Today, with a few clicks of your mouse, you can get a server in the cloud: Basically, someone else will take care of the issues of maintaining hardware for you, and you can configure your new hardware from many miles away. You can also add many servers without requiring any additional real estate. This makes scaling up much easier than it used to be.

Traditional setup

A traditional setup consists of a single server that hosts your application along with a separate database server. This setup usually works fine for small projects and for projects that don't need to scale. And there are times when it doesn't make sense to over-optimize your setup. Our company hosts a number of projects this way.

We're in the process of moving some of our larger projects over to a more modern infrastructure that will allow our applications to scale gracefully.

Building a scalable infrastructure

We're working on creating scalable applications with a scalable infrastructure. In our quest to become an even more awesome development shop, we decided to follow Heroku's Twelve-Factor App methodology, a process specifically designed for building SaaS (software as a service) apps. To summarize, the process helps you make sure your application is easily scalable by keeping things separate. You should be able to take out one piece of your puzzle without worrying that you're going to destroy everything. In other words, your resources should not be so dependent on each other that moving or changing one component will destroy your infrastructure.

The Twelve-Factor App methodology is easy to follow. Not all of our projects are there yet, but we're working to get as close as we can. When it comes to scaling, pay particular attention to factor 9 on disposability: Maximize robustness with fast startup and graceful shutdown.

Best practices

You can try to optimize everything, or you can spend your time efficiently and go for the low-hanging fruit to get your app 80 percent of the way there. Our lead infrastructure engineer likes the 80/20 approach: 20 percent of the work will get you 80 percent of the way to your goal. This section examines the simple things that can help you get close to where you want to be. Let's look at some examples.

1. Set up load balancing

Scenario A: Your webpage was featured on the front page of Reddit and suddenly you're getting a crazy amount of traffic. What do you do?

This is where a load balancer and having several servers running can help. Think of the load balancer as an air traffic controller. It takes all incoming traffic and directs it to a server that can handle that traffic. It also knows that if a server goes down, it should not send any traffic in that direction. Treat the load balancer like a black box: You don't need to worry about how the load balancing gets done, but rest assured that it will get done. New, incoming traffic won't end up hitting a dead server, because the load balancer will direct new traffic to a server that can handle it.

Scenario B: A natural disaster has struck the entire East Coast. How will your application react?

If this were to happen, then everything should be fine because you will have spun up servers in various availability zones, right? Your load balancer will know that all of your servers on the East Coast are down and that it should start sending things to your West Coast servers.

2. Keep different environments looking the same

We have four different environments at our company: local, development, staging, and production. We try to keep them all as similar to each other as possible. This prevents any problems down the line that can be blamed on environmental variables. For example, it's important that you don't rely on anything in your development environment that you won't have access to in production.

3. Use stateless servers

As a rule, don't store information that you need to access in your web application's server. Each copy of the server should look the same, and you shouldn't need to copy information from one server to the next. Ideally, you'd make one image of your application's server and be able to use that to spin up numerous other servers. Keep your database and your application servers separate.

4. Stop your servers often

Emergency drills may seem silly at times, but practice makes perfect. It's critical that everyone knows how to react during an emergency and has actually practiced what to do if one actually happens. In the case of a fire, people might know they're supposed to head to an emergency exit, but you don't want them to forget what an emergency exit looks like if a fire actually happens.

The same is true for server problems. You might think that you have prepared for the worst, but unless you know how your system will actually react when servers go down, you may not be prepared for a situation in which you're not the one bringing them down. This is why it's critical to use stateless servers. Don't get attached to your servers — bring them down often. You should be able to stop and then bring servers back up without being afraid that you're going to break something.

You want to feel secure knowing that a few of your servers can go down in the middle of the night and the system will be okay until you're able to address the problem in the morning.

5. Zero in on bottlenecks

A bottleneck requires you to figure out what's actually causing your application to slow down. You don't want to start chasing down things that might not be actual problems; look for the biggest bottlenecks and spend your time trying to fix those problems.

We use Django at our company, so we like to use the Django Debug Toolbar to determine what's actually slowing us down. Slowdowns can be caused by such things as querying the database more often than necessary. Getting rid of bottlenecks can go a long way toward helping you make sure your applications are performing better.

6. Run background tasks

If you can put off doing something until later, consider using a background task. For example, you might have an expensive API call to make, so you don't need to return information to the user immediately. We use Celery Task Queue to manage tasks that we can put off until later. There are times when we don't need users to get instant feedback, so whenever possible, we offload that task to a time when we can give it the attention it deserves.

7. Cache what you can

If you have a site that's serving up a lot of static content, caching is definitely going to speed things up for you. Users visiting your site over and over again will have to load only the new content. If not much has changed since they last visited your site, the load time will be super fast.

There are several ways you can cache: browser caching and using a Content Distribution Network (CDN), which serves content with high availability and performance. You can set expiration dates on assets, so it's a good idea to set longer expiration dates on things that change infrequently, rather than things that are changing often. For example, a logo image will probably change rarely, while your HTML page will change more frequently. Basically, caching helps with performance without having to change your hardware.

8. Set up autoscaling

You can now spin up new servers with the click of a button. With the ability to spin up and destroy servers so easily, you can ensure that you always have enough servers for the traffic you're trying to handle. You can even time servers to spin up during high traffic times and die down during low traffic times. If you happen to know the patterns of your users, you can scale up and down the number of servers you have at any given time.

9. Involve your whole team

Everyone on your team can help with performance and scaling to some degree. There are things that everyone can do to help make sure it's easier for your team to implement safeguards to ensure that your application will be able to handle spikes in traffic or constant heavy amounts of traffic.

Make sure that everyone on the team is asking the right questions. For example, are certain features really needed or are they just nice to have?

10. Make time for testing

It's important to build timelines with room to test your infrastructure. Not everything is going to go smoothly from day one. You need to stress-test your application, and you need to see your application in a production environment. If testing your application for scaling isn't built into your timeline, you may discover huge problems whose fixes will cause delays.

11. Consider containers

Our company is planning to move to an even more modern approach to handling our web applications. Ideally we're hoping to switch our infrastructure to using containers.

The first thing that went through my mind when I heard about containers was to wonder what was the difference between a container and a virtual machine. They sounded really similar to me. Well, it turns out that containers are better than virtual machines for keeping your applications scalable. They are like virtual machines, but they lack all that bloat of replicating hardware. We can now run a single instance of Linux and have our containers sitting on top, all sharing the same operating system. So containers seem to be a lighter-weight, more cost-effective solution for scaling applications.


While these practices may not prevent all of your problems, they'll help you start to build things with scaling in mind and give you a good foundation for when you have to handle hundreds of millions of users.

To ensure that you're on your way to having a highly scalable application infrastructure, you should do the following:

  • Use a load balancer.
  • Use autoscaling.
  • Don't be afraid to stop your servers randomly.
  • Fix what you need to fix, not what you think you should fix.
  • Make sure you have the time you need to test the app and fix any problems.

Downloadable resources

Related topics

Zone=Cloud computing
ArticleTitle=11 tips for scaling servers in the cloud