Kubernetes Horizontal Pod Autoscaler (HPA)
Kubernetes enables efficient management of distributed applications, and the Horizontal Pod Autoscaler (HPA) addresses the challenge of handling varying loads. HPA automatically scales pods based on CPU, memory, or custom metrics, ensuring optimal performance, availability, and minimal resource wastage, especially in unpredictable production environments. In this article we will explore HPA, covering its functionality, setup, and troubleshooting. What is Horizontal Pod Autoscaler (HPA)? The Horizontal Pod Autoscaler (HPA) is a Kubernetes component that automatically adjusts the number of pods in a deployment, replication controller, or replica set based on resource utilization or custom metrics. Unlike vertical scaling (which increases the resources of individual pods), horizontal scaling adds or removes pods to handle load changes, making it ideal for stateless applications. HPA works by continuously monitoring metrics such as CPU or memory usage. For example, if CPU utilization exceeds a predefined threshold (e.g., 50%), HPA will scale out by creating additional pods. Conversely, if utilization drops, it scales in by terminating excess pods. This dynamic behavior ensures that your application maintains performance during peak loads while avoiding overprovisioning during low-traffic periods, ultimately saving costs and improving resource efficiency. Setting Up HPA in Kubernetes To demonstrate how HPA works, let’s set it up for a simple Nginx application. We’ll deploy Nginx, create a service to expose it, and configure HPA to scale based on CPU utilization. Steps for Setup Deploy Nginx: Create a deployment to run Nginx. Expose the Application: Set up a service to make the deployment accessible. Configure HPA: Define an HPA resource to automate scaling. Below are the YAML configurations and commands to achieve this. YAML Configurations Deployment YAML (nginx-deployment.yaml) This configuration creates a deployment with one Nginx pod initially. apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment spec: replicas: 1 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:latest ports: - containerPort: 80 Explanation: The replicas: 1 field specifies the initial number of pods. The selector ensures that the deployment manages pods with the label app: nginx. The container uses the latest Nginx image and exposes port 80. Service YAML (nginx-service.yaml) This service exposes the Nginx deployment using a LoadBalancer. apiVersion: v1 kind: Service metadata: name: nginx-service spec: selector: app: nginx ports: - protocol: TCP port: 80 targetPort: 80 type: LoadBalancer Explanation: The selector field links the service to pods labeled app: nginx. The type: LoadBalancer ensures the service is accessible externally, though for local testing, you might use ClusterIP or port forwarding. HPA YAML (nginx-hpa.yaml) This configuration sets up HPA to scale the Nginx deployment based on CPU utilization. apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: nginx-hpa namespace: default spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: nginx-deployment minReplicas: 1 maxReplicas: 5 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50 Explanation: The scaleTargetRef specifies the deployment to scale (nginx-deployment). minReplicas and maxReplicas set the scaling bounds (1 to 5 pods). The metrics section defines that HPA should monitor CPU utilization, scaling when it exceeds 50% on average. Applying the Configurations Use the following commands to apply the configurations: kubectl apply -f nginx-deployment.yaml kubectl apply -f nginx-service.yaml kubectl apply -f nginx-hpa.yaml Verifying HPA Scaling Once the configurations are applied, you can test and verify HPA’s scaling behavior. Simulating Load Port-Forward the Service: Make the service accessible locally: kubectl port-forward svc/nginx-service 8080:80 Generate Load: Use curl or a load-testing tool to simulate traffic: while true; do curl http://localhost:8080; sleep 0.1; done This command continuously sends requests to the Nginx service, increasing CPU usage. Checking Scaling Monitor the scaling behavior with these commands: Check pods: kubectl get pods Check HPA status: kubectl get hpa You should see the number of pods increase if CPU utilization exceeds 50%. For example, the output of kubectl get hpa might look like this: NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE n

Kubernetes enables efficient management of distributed applications, and the Horizontal Pod Autoscaler (HPA) addresses the challenge of handling varying loads. HPA automatically scales pods based on CPU, memory, or custom metrics, ensuring optimal performance, availability, and minimal resource wastage, especially in unpredictable production environments.
In this article we will explore HPA, covering its functionality, setup, and troubleshooting.
What is Horizontal Pod Autoscaler (HPA)?
The Horizontal Pod Autoscaler (HPA) is a Kubernetes component that automatically adjusts the number of pods in a deployment, replication controller, or replica set based on resource utilization or custom metrics. Unlike vertical scaling (which increases the resources of individual pods), horizontal scaling adds or removes pods to handle load changes, making it ideal for stateless applications.
HPA works by continuously monitoring metrics such as CPU or memory usage. For example, if CPU utilization exceeds a predefined threshold (e.g., 50%), HPA will scale out by creating additional pods. Conversely, if utilization drops, it scales in by terminating excess pods. This dynamic behavior ensures that your application maintains performance during peak loads while avoiding overprovisioning during low-traffic periods, ultimately saving costs and improving resource efficiency.
Setting Up HPA in Kubernetes
To demonstrate how HPA works, let’s set it up for a simple Nginx application. We’ll deploy Nginx, create a service to expose it, and configure HPA to scale based on CPU utilization.
Steps for Setup
- Deploy Nginx: Create a deployment to run Nginx.
- Expose the Application: Set up a service to make the deployment accessible.
- Configure HPA: Define an HPA resource to automate scaling.
Below are the YAML configurations and commands to achieve this.
YAML Configurations
Deployment YAML (nginx-deployment.yaml
)
This configuration creates a deployment with one Nginx pod initially.
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:latest
ports:
- containerPort: 80
-
Explanation: The
replicas: 1
field specifies the initial number of pods. Theselector
ensures that the deployment manages pods with the labelapp: nginx
. The container uses the latest Nginx image and exposes port 80.
Service YAML (nginx-service.yaml
)
This service exposes the Nginx deployment using a LoadBalancer.
apiVersion: v1
kind: Service
metadata:
name: nginx-service
spec:
selector:
app: nginx
ports:
- protocol: TCP
port: 80
targetPort: 80
type: LoadBalancer
-
Explanation: The
selector
field links the service to pods labeledapp: nginx
. Thetype: LoadBalancer
ensures the service is accessible externally, though for local testing, you might useClusterIP
or port forwarding.
HPA YAML (nginx-hpa.yaml
)
This configuration sets up HPA to scale the Nginx deployment based on CPU utilization.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: nginx-hpa
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx-deployment
minReplicas: 1
maxReplicas: 5
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
-
Explanation: The
scaleTargetRef
specifies the deployment to scale (nginx-deployment
).minReplicas
andmaxReplicas
set the scaling bounds (1 to 5 pods). Themetrics
section defines that HPA should monitor CPU utilization, scaling when it exceeds 50% on average.
Applying the Configurations
Use the following commands to apply the configurations:
kubectl apply -f nginx-deployment.yaml
kubectl apply -f nginx-service.yaml
kubectl apply -f nginx-hpa.yaml
Verifying HPA Scaling
Once the configurations are applied, you can test and verify HPA’s scaling behavior.
Simulating Load
- Port-Forward the Service: Make the service accessible locally:
kubectl port-forward svc/nginx-service 8080:80
-
Generate Load: Use
curl
or a load-testing tool to simulate traffic:
while true; do curl http://localhost:8080; sleep 0.1; done
This command continuously sends requests to the Nginx service, increasing CPU usage.
Checking Scaling
Monitor the scaling behavior with these commands:
- Check pods:
kubectl get pods
- Check HPA status:
kubectl get hpa
You should see the number of pods increase if CPU utilization exceeds 50%. For example, the output of kubectl get hpa
might look like this:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
nginx-hpa Deployment/nginx-deployment 70%/50% 1 5 3 10m
This indicates that CPU usage is at 70% (above the 50% target), so HPA has scaled to 3 replicas.
Troubleshooting Common HPA Issues
Sometimes, HPA may not scale as expected due to issues with the metrics-server, which provides resource usage data. Common problems include the metrics-server pod not being ready or misconfigurations.
Potential Issues and Solutions
-
Metrics-Server Not Ready: If
kubectl get hpa
shows no targets or errors, check the metrics-server.
- Check Metrics-Server Logs:
kubectl logs -n kube-system metrics-server-
- View Events:
kubectl get events -n kube-system
Verify API Server Access: Ensure the metrics-server can communicate with the Kubernetes API server. Network policies or RBAC issues might block access.
Check Permissions: The metrics-server needs the correct RBAC permissions. Verify the role bindings:
kubectl get clusterrolebinding metrics-server
Ensure that the metrics-server
cluster role binding exists and has the necessary permissions to access resource metrics.
- Fix Certificate Issues: For local clusters, certificate errors are common. Patch the metrics-server deployment to allow insecure TLS:
kubectl patch deployment metrics-server -n kube-system \
-p '{"spec":{"template":{"spec":{"containers":[{"name":"metrics-server","command":["/metrics-server","--kubelet-insecure-tls"]}]}}}}'
After patching, verify the pod status:
kubectl get pods -n kube-system
Once resolved, HPA should start scaling correctly.
Manual Scaling of Pods
If HPA isn’t scaling as expected or you need immediate adjustments, you can manually scale pods using kubectl scale
.
Commands
- Scale Up (e.g., to 3 replicas):
kubectl scale deployment nginx-deployment --replicas=3
- Scale Down (e.g., to 1 replica):
kubectl scale deployment nginx-deployment --replicas=1
Manual scaling is useful for quick adjustments but should be used cautiously, as HPA will eventually override manual changes based on metrics.
Configuring Advanced Scaling Behavior: Stabilization Window
HPA includes advanced options like the stabilization window, which controls how quickly scaling decisions are made to prevent rapid oscillations (e.g., scaling up and down too frequently).
What is Stabilization Window?
The stabilization window specifies the time period HPA waits before making scaling decisions. For scaling down, a longer window prevents premature reduction of pods, ensuring stability.
Configuring Stabilization Window
You can modify the HPA using kubectl patch
or update the YAML.
Using kubectl patch
To set a 60-second stabilization window for scaling down:
kubectl patch hpa nginx-hpa -p '{"spec":{"behavior":{"scaleDown":{"stabilizationWindowSeconds":60}}}}'
Equivalent YAML Update
Update the nginx-hpa.yaml
to include scaling behavior:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: nginx-hpa
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx-deployment
minReplicas: 1
maxReplicas: 5
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
behavior:
scaleDown:
stabilizationWindowSeconds: 60
selectPolicy: Max
policies:
- type: Percent
value: 100
periodSeconds: 60
-
Explanation: The
stabilizationWindowSeconds: 60
ensures that HPA waits 60 seconds before scaling down, reducing the risk of rapid scaling oscillations.
Apply the updated YAML:
kubectl apply -f nginx-hpa.yaml
Conclusion
We've explored how to autoscale our containers using Kubernetes Horizontal Pod Autoscaler (HPA).