Scaling patterns in Kubernetes: VPA, HPA and KEDA
I've been working with a few pod scaling options in past but last year I started working with KEDA. So, I thought to write post to explain possible options in pod autoscaling. On the other side, manually adjusting parameters is not only slow but also inefficient. If you decide to allocate too little resource and you'll deliver subpar user experience or can experience application outages. If you over-provision resources "just in case" and you'll waste money and resources. That's where Kubernetes autoscaling comes to the rescue and deliver the right resources when required. Understanding Kubernetes Pod Autoscaling Fundamentals Autoscaling in Kubernetes means dynamically allocating cluster resources like CPU and memory to your applications based on real-time demand. This ensures applications have the right amount of resources to handle varying levels of load, directly improving application performance and availability. Key Benefits of Autoscaling: Cost Efficiency: Pay only for the resources you need instead of over-provisioning Environmental Impact: Reduced power consumption and carbon emissions through better resource alignment Time Savings: Automates manual resource adjustment tasks, freeing up valuable DevOps time Performance Optimization: Ensures applications maintain optimal performance under varying loads The Three Pillars of Kubernetes Autoscaling Kubernetes offers three primary autoscaling mechanisms, each serving different purposes: Vertical Pod Autoscaler (VPA) - Adjusts resource requests and limits within individual pods Horizontal Pod Autoscaler (HPA) - Scales the number of pod replicas up or down *Kubernetes Event-Driven Autoscaler *(KEDA) - Scales based on external events and custom metrics Let's explore each of them one by one. Vertical Pod Autoscaler (VPA): Right-sizing Your Pods What is VPA? The Vertical Pod Autoscaler automatically adjusts the CPU and memory requests and limits of individual containers within pods based on historical usage patterns. Instead of scaling the number of pods, VPA makes your existing pods "beefier" or "leaner" based on their actual resource needs. How VPA Works VPA operates through three core components: Recommender: Calculates optimal resource values based on historical metrics from the Kubernetes Metrics Server, analyzing up to 8 days of data to generate recommendations. Updater: Monitors recommendation changes and evicts pods when resource adjustments are needed, forcing replacement with updated allocations. Admission Webhook: Intercepts new pod deployments and injects updated resource values based on VPA recommendations. When to Use VPA VPA is ideal for: Stateful applications that can't be easily scaled horizontally Resource optimization scenarios where you need to fine-tune individual pod resources Applications with unpredictable resource patterns that traditional static allocation can't handle Cost optimization efforts to eliminate resource waste. VPA configuration example apiVersion: autoscaling.k8s.io/v1 kind: VerticalPodAutoscaler metadata: name: my-app-vpa spec: targetRef: apiVersion: "apps/v1" kind: "Deployment" name: "my-app" updatePolicy: updateMode: "Auto" resourcePolicy: containerPolicies: - containerName: '*' maxAllowed: cpu: 1 memory: 500Mi minAllowed: cpu: 100m memory: 50Mi Challenges with VPA Despite its benefits, VPA has several limitations: Incompatibility with HPA: Cannot run both tools together for CPU/memory-based scaling Limited historical data: Only stores 8 days of metrics, losing data on pod restarts Service disruption: Pod evictions cause momentary service interruptions No time-based controls: Pod evictions can happen at any time, including peak hours Cluster-wide configuration: Limited per-workload customization options Horizontal Pod Autoscaler (HPA): Scaling Out Your Application What is HPA? HPA automatically scales the number of pod replicas in a Deployment, ReplicaSet, or StatefulSet based on observed metrics like CPU utilization, memory usage, or custom metrics. It's the most fundamental and widely-used autoscaling pattern in Kubernetes. How HPA Overcomes VPA Challenges While VPA adjusts resources within pods, HPA takes a different approach: No service disruption: Scaling replicas doesn't require pod eviction Works with stateless applications: Perfect for horizontally scalable workloads Predictable scaling: Based on well-understood metrics like CPU and memory Mature and stable: Built-in Kubernetes feature with extensive community support When to Use HPA HPA is perfect for: Stateless applications where pods are interchangeable Predictable workloads with clear load patterns Web applications that experience traffic variations Microservices that can benefit from horizontal scaling apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: my-app-hpa spec: scaleTargetRef: apiVers

I've been working with a few pod scaling options in past but last year I started working with KEDA. So, I thought to write post to explain possible options in pod autoscaling. On the other side, manually adjusting parameters is not only slow but also inefficient. If you decide to allocate too little resource and you'll deliver subpar user experience or can experience application outages. If you over-provision resources "just in case" and you'll waste money and resources. That's where Kubernetes autoscaling comes to the rescue and deliver the right resources when required.
Understanding Kubernetes Pod Autoscaling Fundamentals
Autoscaling in Kubernetes means dynamically allocating cluster resources like CPU and memory to your applications based on real-time demand. This ensures applications have the right amount of resources to handle varying levels of load, directly improving application performance and availability.
Key Benefits of Autoscaling:
Cost Efficiency: Pay only for the resources you need instead of over-provisioning
Environmental Impact: Reduced power consumption and carbon emissions through better resource alignment
Time Savings: Automates manual resource adjustment tasks, freeing up valuable DevOps time
Performance Optimization: Ensures applications maintain optimal performance under varying loads
The Three Pillars of Kubernetes Autoscaling
Kubernetes offers three primary autoscaling mechanisms, each serving different purposes:
Vertical Pod Autoscaler (VPA) - Adjusts resource requests and limits within individual pods
Horizontal Pod Autoscaler (HPA) - Scales the number of pod replicas up or down
*Kubernetes Event-Driven Autoscaler *(KEDA) - Scales based on external events and custom metrics
Let's explore each of them one by one.
Vertical Pod Autoscaler (VPA): Right-sizing Your Pods
What is VPA?
The Vertical Pod Autoscaler automatically adjusts the CPU and memory requests and limits of individual containers within pods based on historical usage patterns. Instead of scaling the number of pods, VPA makes your existing pods "beefier" or "leaner" based on their actual resource needs.
How VPA Works
VPA operates through three core components:
Recommender: Calculates optimal resource values based on historical metrics from the Kubernetes Metrics Server, analyzing up to 8 days of data to generate recommendations.
Updater: Monitors recommendation changes and evicts pods when resource adjustments are needed, forcing replacement with updated allocations.
Admission Webhook: Intercepts new pod deployments and injects updated resource values based on VPA recommendations.
When to Use VPA
VPA is ideal for:
- Stateful applications that can't be easily scaled horizontally
- Resource optimization scenarios where you need to fine-tune individual pod resources
- Applications with unpredictable resource patterns that traditional static allocation can't handle
- Cost optimization efforts to eliminate resource waste.
VPA configuration example
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: "Deployment"
name: "my-app"
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: '*'
maxAllowed:
cpu: 1
memory: 500Mi
minAllowed:
cpu: 100m
memory: 50Mi
Challenges with VPA
Despite its benefits, VPA has several limitations:
Incompatibility with HPA: Cannot run both tools together for CPU/memory-based scaling
Limited historical data: Only stores 8 days of metrics, losing data on pod restarts
Service disruption: Pod evictions cause momentary service interruptions
No time-based controls: Pod evictions can happen at any time, including peak hours
Cluster-wide configuration: Limited per-workload customization options
Horizontal Pod Autoscaler (HPA): Scaling Out Your Application
What is HPA?
HPA automatically scales the number of pod replicas in a Deployment, ReplicaSet, or StatefulSet based on observed metrics like CPU utilization, memory usage, or custom metrics. It's the most fundamental and widely-used autoscaling pattern in Kubernetes.
How HPA Overcomes VPA Challenges
While VPA adjusts resources within pods, HPA takes a different approach:
No service disruption: Scaling replicas doesn't require pod eviction
Works with stateless applications: Perfect for horizontally scalable workloads
Predictable scaling: Based on well-understood metrics like CPU and memory
Mature and stable: Built-in Kubernetes feature with extensive community support
When to Use HPA
HPA is perfect for:
- Stateless applications where pods are interchangeable
- Predictable workloads with clear load patterns
- Web applications that experience traffic variations
- Microservices that can benefit from horizontal scaling
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
HPA Limitations
While HPA is powerful, it has constraints:
- Limited to resource metrics: Basic HPA only works with CPU/memory metrics
- Not suitable for event-driven workloads: Can't scale based on queue lengths or custom events
- Reactive scaling: Only responds after metrics breach thresholds
- No scale-to-zero: Cannot scale down to zero replicas
KEDA: Event-Driven Autoscaling for Modern Applications
What is KEDA?
Kubernetes Event-Driven Autoscaling (KEDA) extends Kubernetes' native autoscaling capabilities to allow applications to scale based on events from various sources like message queues, databases, or custom metrics. KEDA graduated as a CNCF project, highlighting its importance in the cloud-native ecosystem.
How KEDA Overcomes HPA Limitations
KEDA addresses several HPA shortcomings:
- Event-driven scaling: Scales based on queue lengths, database records, HTTP requests, and more
- Scale-to-zero capability: Can scale applications down to zero when no events are present
- Rich ecosystem: Supports 50+ event sources including Kafka, RabbitMQ, Azure Service Bus, AWS SQS
- Custom metrics: Works with any metric source through external scalers
KEDA excels in:
- Event-driven architectures with message queues and event buses
- Serverless-style workloads that benefit from scale-to-zero
- Batch processing jobs triggered by data availability
- IoT applications processing sensor data streams Machine learning pipelines processing inference requests
KEDA configuration example
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: rabbitmq-scaler
spec:
scaleTargetRef:
name: message-processor
triggers:
- type: rabbitmq
metadata:
protocol: amqp
queueName: work-queue
mode: QueueLength
value: "5"
KEDA vs HPA: Key Differences
Choosing the Right Autoscaling Strategy
Use VPA When:
- You have stateful applications that can't scale horizontally
- Resource optimization is your primary concern You need to fine-tune individual pod resources Applications have unpredictable resource usage patterns
Use HPA When:
- You have stateless, horizontally scalable applications
- Traditional web applications with predictable load patterns
- Simple microservices that scale based on CPU/memory
- You need a proven, stable autoscaling solution
Use KEDA When:
- Building event-driven or serverless-style applications
- Processing messages from queues or streams
- Need to scale based on custom or external metrics
- Cost optimization through scale-to-zero is important
Real-World Implementation Scenarios
Scenario 1: E-commerce Platform
- Frontend services: HPA for web servers based on CPU utilization
- Order processing: KEDA for scaling based on order queue length
- Database connections: VPA for optimizing connection pool resources
Scenario 2: IoT Data Pipeline
- Data ingestion: KEDA scaling based on message queue depth
- Stream processing: HPA for consistent throughput requirements
- Analytics services: VPA for memory-intensive data processing
Scenario 3: Machine Learning Platform
- Model serving: HPA for inference API endpoints
- Training jobs: KEDA triggered by training request queues
- Feature processing: VPA for compute-intensive transformations
Best Practices and Recommendations
- Start Simple: Begin with HPA for basic scaling needs, then add KEDA for event-driven requirements
- Monitor and Adjust: Continuously monitor scaling behavior and adjust thresholds
- Combine Strategies: Use different autoscalers for different components of your application
- Set Resource Limits: Always define appropriate resource limits to prevent runaway scaling
- Test Thoroughly: Validate autoscaling behavior under various load conditions
Conclusion
Kubernetes autoscaling is not a one-size-fits-all solution. The choice between VPA, HPA, and KEDA depends on your specific application requirements, architecture patterns, and operational needs. VPA optimizes resource utilization within pods, HPA provides reliable horizontal scaling for traditional workloads, and KEDA enables sophisticated event-driven scaling for modern cloud-native applications.
By understanding the strengths and limitations of each approach, you can design a comprehensive autoscaling strategy that optimizes both performance and cost while maintaining the reliability your applications demand.
If you are looking for a more deep dive course and hands on labs on Kubernetes Autoscaling and KEDA, you can checkout official LinuxFoundation course for no cost.