The rise of containerization technologies, like Docker, and systems for orchestrating and deploying these containers, like Kubernetes, have completely transformed the traditional application architecture. Traditionally, an application was more of a monolith. Sure, you might have dedicated storage and database servers, but the various components of the application itself were tied into a single deployable object. With microservice architecture, the application is broken down into separate pieces —microservices— that update and deploy independently.
This microservice approach provides freedom and flexibility for the programming language used as well as the tools and frameworks that are implemented to administer the application. Another area of flexibility is where resources are dedicated to your architecture. Because of this modular design, the administrator can strategically allocate resources to the microservices which minimize resources that are wasted and maximizing the experience for the end user. This typically occurs in one of two ways – vertical or horizontal scaling.
Vertical scaling has to do with increasing the resource limits on an individual Pod. On the flip side, horizontal scaling is about increasing the number of Pods across the cluster. Depending on the microservice in question, resource limits may need to be increased on the individual Pod, but the beauty of horizontal scaling is that, if a problem occurs, the Pods aren’t all in the same place. In the remainder of this post, we will walk through the process of scaling microservices in Kubernetes using both manual and automatic scaling.
Typically, the number of Pods for a given application are set during the planning stages. However, It’s very difficult to gauge how popular an application (especially a new one) will be with the clients. Because of this, manual intervention can sometimes be necessary. Thankfully, the ability to manually scale applications is already provided by Kubernetes.
Manual scaling can be done by updating the number of replicas in the deployment YAML and applying the changes:
kubectl apply -f deployment.yml
This can also be done directly from the command line without updating the YAML deployment:
kubectl scale deployment/<DEPLOYMENT> --replicas=<NUMBER_OF_REPLICAS> -n <NAMESPACE>
Scaling down an application is done in the same way by reducing the replica count to the desired number.
Kubernetes also provides the ability to implement automatic scaling through the use of a Horizontal Pod Autoscaler (HPA). This is a fantastic feature of Kubernetes, especially since application traffic is rarely static. Implementing an HPA allows the application to scale up and down without the need for manual intervention. The criteria for scaling (down or up) is usually based upon the observed CPU utilization but can take into account other custom metrics.
Create a Horizontal Pod Autoscaler for a deployment:
kubectl autoscale deployment <DEPLOYMENT> -n <NAMESPACE> --min <MIN_NUMBER_OF_PODS> --max <MAX_NUMBER_OF_PODS> --cpu-percent <PERCENT_OF_UTILIZATION>
Kubernetes has many wonderful features that have propelled it to be one of the most popular platforms in this modern IT era. I hope that this quick dive into scaling microservices in Kubernetes has piqued your interest in this great technology. If you would like to learn more about Kubernetes, check out my Learn Microservices by Doing course or the other great Kubernetes content at Linux Academy!