March 7, 2023
Modern applications are constantly changing, evolving with new requirements and exist in an environment with varying demands on resources. Scaling an application can appropriately size it to resource demands to ensure happy customers and reduce infrastructure costs. If you don’t know how to scale efficiently, you are not just doing a disservice to your application, you are putting unnecessary stress on your operations team. Manually trying to determine when to scale up or out is extremely difficult. If you buy more infrastructure to accommodate your peak traffic, you could be overspending when load is not at peak. If you target your average load, spikes in traffic will impact your application performance and, when traffic drops, these resources will go unused.
Scaling out, or horizontal scaling, contrasts to scaling out, or vertical scaling. The idea of scaling cloud resources may be intuitive. As your cloud workload changes it may be necessary to increase infrastructure to support increasing load or it may make sense to decrease infrastructure when demand is low. The “up or out” part is perhaps less intuitive. Scaling out is adding more equivalently functional components in parallel to spread out a load. This would be going from two load-balanced web server instances to three instances. Scaling up, in contrast, is making a component larger or faster to handle a greater load. This would be moving your application to a virtual server (VM) with 2 CPU to one with 3 CPUs. For completeness, scaling down refers to decreasing your system resources, regardless of whether you were using the up or out approach.
Resources such as CPU, network, and storage are common targets for scaling up. The goal is to increase the resources supporting your application to reach or maintain adequate performance. In a hardware-centric world, this might mean adding a larger hard drive to a computer for increased storage capacity. It might mean replacing the entire computer with a machine that has more CPU and a more performant network interface. If you are managing a non-cloud system, this scaling up process can take anywhere from weeks up to months as you request, purchase, install, and finally deploy the new resources.
In a cloud system, the process should take seconds or minutes. A cloud system might still target hardware and that will be on the tens of minutes end of the time to scale range. But virtualized systems dominate cloud computing and some scaling actions, like increasing storage volume capacity or spinning up a new container to scale up a microservice can take seconds to deploy. What is being scaled will not be that different. One may still shift applications to a larger VM or it may be as simple as allocating more capacity on an attached storage volume.
Regardless of whether you are dealing with virtual or hardware resources, the take-home point is that you are moving from one smaller resource and scaling up to one larger, more performant resource.
Scaling up makes sense when you have an application that needs to sit on a single machine. If you have an application that has a loosely coupled architecture, it becomes possible to easily scale out by replicating resources.
Scaling out a microservices application can be as simple as spinning up a new container running a webserver app and adding it to the load balancer pool. When scaling out the idea is that it is possible to add identical services to a system to increase performance. Systems that support this model also tolerate the removal of resources when the load decreases. This allows greater fluidity in scaling resource size in response to changing conditions.
The incremental nature of the scale out model is of great benefit when considering cost management. Because components are identical, cost increments should be relatively predictable. Scaling out also provides greater responsiveness to changes in demand. Typically services can be rapidly added or removed to best meet resource needs. This flexibility and speed effectively reduces spending by only using (and paying for) the resources needed at the time.