[AWS] EKS Auto Mode Node lifecycle [EKS]

Introduction Node Lifecycle nodes launched by EKS Auto Mode have a maximum lifetime of 21 days (which you can reduce), after which they are automatically replaced with new nodes. Terminates instances after 336 hours by default https://docs.aws.amazon.com/eks/latest/userguide/create-node-pool.html spec: expireAfter: 336h The upper use Node disruption. https://karpenter.sh/docs/concepts/disruption/ Karpenter automatically discovers disruptable nodes and spins up replacements when needed. Concept of Disruption Controller Deciding the priority of interrupted nodes Interruption node checks disruption budget spec.disruption.budgets. If undefined, Karpenter will default to one budget with nodes: 10% spec: disruption: budgets: - nodes: 10% The need for replacement nodes taints: - effect: NoSchedule key: CriticalAddonsOnly terminationGracePeriod: 24h0m0s By assigning CriticalAddonsOnly as a taint to a node, you can prevent Pods other than system Pods from being deployed to that node. Wait until the replacement node starts up. Delete the node(s) and wait for the Termination Controller to gracefully shutdown the node(s). Consolidation is configured by consolidationPolicy and consolidateAfter. spec: disruption: budgets: - nodes: 10% consolidateAfter: 30s This can be used in cases where ECS application spin-up is slow, to delay node replacement to a certain extent. Multi Node Consolidation - Try to delete two or more nodes in parallel, possibly launching a single replacement whose price is lower than that of all nodes being removed Node resource efficiency is automatically adjusted by adjusting the node instance type. Using preferred anti-affinity and topology spreads can reduce the effectiveness of consolidation When using anti-affinity or topology, this setting takes precedence. If interruption-handling is enabled, Karpenter will watch for upcoming involuntary interruption events that would cause disruption to your workloads. It is advisable to monitor interrupt events. Node Auto Repair is a feature that automatically identifies and replaces unhealthy nodes in your cluster,but node repair feature is alpha feature. Since APIs other than GA cannot be enabled with EKS Feature gate, I believe this cannot be used. Try Custom NodePool Get Nodepools kubectl get nodepools -o yaml > nodepools.yaml expireAfter parameter edit kubectl apply -f nodepools.yaml The settings will be reflected immediately.

May 9, 2025 - 00:30

Introduction

Node Lifecycle

nodes launched by EKS Auto Mode have a maximum lifetime of 21 days (which you can reduce), after which they are automatically replaced with new nodes.

Terminates instances after 336 hours by default
https://docs.aws.amazon.com/eks/latest/userguide/create-node-pool.html

      spec:
        expireAfter: 336h

The upper use Node disruption.
https://karpenter.sh/docs/concepts/disruption/

Karpenter automatically discovers disruptable nodes and spins up replacements when needed.

Concept of Disruption Controller

Deciding the priority of interrupted nodes
Interruption node checks disruption budget

spec.disruption.budgets. If undefined, Karpenter will default to one budget with nodes: 10%

  spec:
    disruption:
      budgets:
      - nodes: 10%

The need for replacement nodes

        taints:
        - effect: NoSchedule
          key: CriticalAddonsOnly
        terminationGracePeriod: 24h0m0s

By assigning CriticalAddonsOnly as a taint to a node, you can prevent Pods other than system Pods from being deployed to that node.

Wait until the replacement node starts up.

Delete the node(s) and wait for the Termination Controller to gracefully shutdown the node(s).

Consolidation is configured by consolidationPolicy and consolidateAfter.

  spec:
    disruption:
      budgets:
      - nodes: 10%
      consolidateAfter: 30s

This can be used in cases where ECS application spin-up is slow, to delay node replacement to a certain extent.

Multi Node Consolidation - Try to delete two or more nodes in parallel, possibly launching a single replacement whose price is lower than that of all nodes being removed

Node resource efficiency is automatically adjusted by adjusting the node instance type.

Using preferred anti-affinity and topology spreads can reduce the effectiveness of consolidation

When using anti-affinity or topology, this setting takes precedence.

If interruption-handling is enabled, Karpenter will watch for upcoming involuntary interruption events that would cause disruption to your workloads.

It is advisable to monitor interrupt events.

Node Auto Repair is a feature that automatically identifies and replaces unhealthy nodes in your cluster,but node repair feature is alpha feature.

Since APIs other than GA cannot be enabled with EKS Feature gate, I believe this cannot be used.

Try Custom NodePool

Get Nodepools

kubectl get nodepools  -o yaml > nodepools.yaml

expireAfter parameter edit

kubectl apply -f nodepools.yaml

The settings will be reflected immediately.