Kubernetes Horizontal Pod Autoscaler (HPA)

Kubernetes enables efficient management of distributed applications, and the Horizontal Pod Autoscaler (HPA) addresses the challenge of handling varying loads. HPA automatically scales pods based on CPU, memory, or custom metrics, ensuring optimal performance, availability, and minimal resource wastage, especially in unpredictable production environments. In this article we will explore HPA, covering its functionality, setup, and troubleshooting. What is Horizontal Pod Autoscaler (HPA)? The Horizontal Pod Autoscaler (HPA) is a Kubernetes component that automatically adjusts the number of pods in a deployment, replication controller, or replica set based on resource utilization or custom metrics. Unlike vertical scaling (which increases the resources of individual pods), horizontal scaling adds or removes pods to handle load changes, making it ideal for stateless applications. HPA works by continuously monitoring metrics such as CPU or memory usage. For example, if CPU utilization exceeds a predefined threshold (e.g., 50%), HPA will scale out by creating additional pods. Conversely, if utilization drops, it scales in by terminating excess pods. This dynamic behavior ensures that your application maintains performance during peak loads while avoiding overprovisioning during low-traffic periods, ultimately saving costs and improving resource efficiency. Setting Up HPA in Kubernetes To demonstrate how HPA works, let’s set it up for a simple Nginx application. We’ll deploy Nginx, create a service to expose it, and configure HPA to scale based on CPU utilization. Steps for Setup Deploy Nginx: Create a deployment to run Nginx. Expose the Application: Set up a service to make the deployment accessible. Configure HPA: Define an HPA resource to automate scaling. Below are the YAML configurations and commands to achieve this. YAML Configurations Deployment YAML (nginx-deployment.yaml) This configuration creates a deployment with one Nginx pod initially. apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment spec: replicas: 1 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:latest ports: - containerPort: 80 Explanation: The replicas: 1 field specifies the initial number of pods. The selector ensures that the deployment manages pods with the label app: nginx. The container uses the latest Nginx image and exposes port 80. Service YAML (nginx-service.yaml) This service exposes the Nginx deployment using a LoadBalancer. apiVersion: v1 kind: Service metadata: name: nginx-service spec: selector: app: nginx ports: - protocol: TCP port: 80 targetPort: 80 type: LoadBalancer Explanation: The selector field links the service to pods labeled app: nginx. The type: LoadBalancer ensures the service is accessible externally, though for local testing, you might use ClusterIP or port forwarding. HPA YAML (nginx-hpa.yaml) This configuration sets up HPA to scale the Nginx deployment based on CPU utilization. apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: nginx-hpa namespace: default spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: nginx-deployment minReplicas: 1 maxReplicas: 5 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50 Explanation: The scaleTargetRef specifies the deployment to scale (nginx-deployment). minReplicas and maxReplicas set the scaling bounds (1 to 5 pods). The metrics section defines that HPA should monitor CPU utilization, scaling when it exceeds 50% on average. Applying the Configurations Use the following commands to apply the configurations: kubectl apply -f nginx-deployment.yaml kubectl apply -f nginx-service.yaml kubectl apply -f nginx-hpa.yaml Verifying HPA Scaling Once the configurations are applied, you can test and verify HPA’s scaling behavior. Simulating Load Port-Forward the Service: Make the service accessible locally: kubectl port-forward svc/nginx-service 8080:80 Generate Load: Use curl or a load-testing tool to simulate traffic: while true; do curl http://localhost:8080; sleep 0.1; done This command continuously sends requests to the Nginx service, increasing CPU usage. Checking Scaling Monitor the scaling behavior with these commands: Check pods: kubectl get pods Check HPA status: kubectl get hpa You should see the number of pods increase if CPU utilization exceeds 50%. For example, the output of kubectl get hpa might look like this: NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE n

Apr 14, 2025 - 08:47

Kubernetes Horizontal Pod Autoscaler (HPA)

Kubernetes enables efficient management of distributed applications, and the Horizontal Pod Autoscaler (HPA) addresses the challenge of handling varying loads. HPA automatically scales pods based on CPU, memory, or custom metrics, ensuring optimal performance, availability, and minimal resource wastage, especially in unpredictable production environments.

In this article we will explore HPA, covering its functionality, setup, and troubleshooting.

What is Horizontal Pod Autoscaler (HPA)?

The Horizontal Pod Autoscaler (HPA) is a Kubernetes component that automatically adjusts the number of pods in a deployment, replication controller, or replica set based on resource utilization or custom metrics. Unlike vertical scaling (which increases the resources of individual pods), horizontal scaling adds or removes pods to handle load changes, making it ideal for stateless applications.

HPA works by continuously monitoring metrics such as CPU or memory usage. For example, if CPU utilization exceeds a predefined threshold (e.g., 50%), HPA will scale out by creating additional pods. Conversely, if utilization drops, it scales in by terminating excess pods. This dynamic behavior ensures that your application maintains performance during peak loads while avoiding overprovisioning during low-traffic periods, ultimately saving costs and improving resource efficiency.

Setting Up HPA in Kubernetes

To demonstrate how HPA works, let’s set it up for a simple Nginx application. We’ll deploy Nginx, create a service to expose it, and configure HPA to scale based on CPU utilization.

Steps for Setup

Deploy Nginx: Create a deployment to run Nginx.
Expose the Application: Set up a service to make the deployment accessible.
Configure HPA: Define an HPA resource to automate scaling.

Below are the YAML configurations and commands to achieve this.

YAML Configurations

Deployment YAML (`nginx-deployment.yaml`)

This configuration creates a deployment with one Nginx pod initially.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
        - name: nginx
          image: nginx:latest
          ports:
            - containerPort: 80

Explanation: The replicas: 1 field specifies the initial number of pods. The selector ensures that the deployment manages pods with the label app: nginx. The container uses the latest Nginx image and exposes port 80.

Service YAML (`nginx-service.yaml`)

This service exposes the Nginx deployment using a LoadBalancer.

apiVersion: v1
kind: Service
metadata:
  name: nginx-service
spec:
  selector:
    app: nginx
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
  type: LoadBalancer

Explanation: The selector field links the service to pods labeled app: nginx. The type: LoadBalancer ensures the service is accessible externally, though for local testing, you might use ClusterIP or port forwarding.

HPA YAML (`nginx-hpa.yaml`)

This configuration sets up HPA to scale the Nginx deployment based on CPU utilization.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx-deployment
  minReplicas: 1
  maxReplicas: 5
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

Explanation: The scaleTargetRef specifies the deployment to scale (nginx-deployment). minReplicas and maxReplicas set the scaling bounds (1 to 5 pods). The metrics section defines that HPA should monitor CPU utilization, scaling when it exceeds 50% on average.

Applying the Configurations

Use the following commands to apply the configurations:

kubectl apply -f nginx-deployment.yaml
kubectl apply -f nginx-service.yaml
kubectl apply -f nginx-hpa.yaml

Verifying HPA Scaling

Once the configurations are applied, you can test and verify HPA’s scaling behavior.

Simulating Load

Port-Forward the Service: Make the service accessible locally:

   kubectl port-forward svc/nginx-service 8080:80

Generate Load: Use curl or a load-testing tool to simulate traffic:

   while true; do curl http://localhost:8080; sleep 0.1; done

This command continuously sends requests to the Nginx service, increasing CPU usage.

Checking Scaling

Monitor the scaling behavior with these commands:

Check pods:

  kubectl get pods

Check HPA status:

  kubectl get hpa

You should see the number of pods increase if CPU utilization exceeds 50%. For example, the output of kubectl get hpa might look like this:

NAME         REFERENCE                     TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
nginx-hpa    Deployment/nginx-deployment   70%/50%         1         5         3          10m

This indicates that CPU usage is at 70% (above the 50% target), so HPA has scaled to 3 replicas.

Troubleshooting Common HPA Issues

Sometimes, HPA may not scale as expected due to issues with the metrics-server, which provides resource usage data. Common problems include the metrics-server pod not being ready or misconfigurations.

Potential Issues and Solutions

Metrics-Server Not Ready: If kubectl get hpa shows no targets or errors, check the metrics-server.

Check Metrics-Server Logs:

   kubectl logs -n kube-system metrics-server-

View Events:

   kubectl get events -n kube-system

Verify API Server Access: Ensure the metrics-server can communicate with the Kubernetes API server. Network policies or RBAC issues might block access.
Check Permissions: The metrics-server needs the correct RBAC permissions. Verify the role bindings:

   kubectl get clusterrolebinding metrics-server

Ensure that the metrics-server cluster role binding exists and has the necessary permissions to access resource metrics.

Fix Certificate Issues: For local clusters, certificate errors are common. Patch the metrics-server deployment to allow insecure TLS:

   kubectl patch deployment metrics-server -n kube-system \
     -p '{"spec":{"template":{"spec":{"containers":[{"name":"metrics-server","command":["/metrics-server","--kubelet-insecure-tls"]}]}}}}'

After patching, verify the pod status:

   kubectl get pods -n kube-system

Once resolved, HPA should start scaling correctly.

Manual Scaling of Pods

If HPA isn’t scaling as expected or you need immediate adjustments, you can manually scale pods using kubectl scale.

Commands

Scale Up (e.g., to 3 replicas):

  kubectl scale deployment nginx-deployment --replicas=3

Scale Down (e.g., to 1 replica):

  kubectl scale deployment nginx-deployment --replicas=1

Manual scaling is useful for quick adjustments but should be used cautiously, as HPA will eventually override manual changes based on metrics.

Configuring Advanced Scaling Behavior: Stabilization Window

HPA includes advanced options like the stabilization window, which controls how quickly scaling decisions are made to prevent rapid oscillations (e.g., scaling up and down too frequently).

What is Stabilization Window?

The stabilization window specifies the time period HPA waits before making scaling decisions. For scaling down, a longer window prevents premature reduction of pods, ensuring stability.

Configuring Stabilization Window

You can modify the HPA using kubectl patch or update the YAML.

Using `kubectl patch`

To set a 60-second stabilization window for scaling down:

kubectl patch hpa nginx-hpa -p '{"spec":{"behavior":{"scaleDown":{"stabilizationWindowSeconds":60}}}}'

Equivalent YAML Update

Update the nginx-hpa.yaml to include scaling behavior:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx-deployment
  minReplicas: 1
  maxReplicas: 5
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 60
      selectPolicy: Max
      policies:
        - type: Percent
          value: 100
          periodSeconds: 60

Explanation: The stabilizationWindowSeconds: 60 ensures that HPA waits 60 seconds before scaling down, reducing the risk of rapid scaling oscillations.

Apply the updated YAML:

kubectl apply -f nginx-hpa.yaml

Conclusion

We've explored how to autoscale our containers using Kubernetes Horizontal Pod Autoscaler (HPA).