The Hidden Cost of Kubernetes
Kubernetes abstracts away servers so well that teams stop thinking about capacity. Nodes get provisioned for peak load and then run at 15% CPU utilisation 23 hours a day. We have audited over 30 EKS clusters in the last two years, and the average over-provisioning rate is 62%.
Step 1: Measure First
Install Kubecost (free tier covers most clusters) and let it run for two weeks. Look at the namespace-level cost allocation report — you will almost certainly find that 80% of cost comes from 20% of workloads. Focus there first.
Step 2: Right-Size Requests and Limits
Resource requests are the #1 driver of wasted spend. An application requesting 2 CPU cores but using 200m average will reserve (and be charged for) 2 cores even on an autoscaling cluster. Use the Goldilocks controller (from Fairwinds) to get VPA-based right-sizing recommendations for every deployment, then apply them incrementally with load testing.
Step 3: Node Autoscaling with Karpenter
Replace the Cluster Autoscaler with Karpenter. Karpenter provisions nodes in under 60 seconds (vs. 3–5 minutes), consolidates underutilised nodes automatically, and can mix instance types and Spot/On-Demand in a single NodePool. In our benchmarks, Karpenter reduces node count by 20–35% compared to Cluster Autoscaler on identical workloads.
Step 4: Spot Instances for Stateless Workloads
Most web services, batch jobs, and ML inference workloads are stateless and can tolerate a 2-minute interruption notice. Running them on Spot instances saves 60–80% on compute vs On-Demand. Use Karpenter's spec.disruption.budgets to ensure you never lose more than N% of replicas simultaneously.
Step 5: Vertical Scaling at Night
Implement KEDA (Kubernetes Event-Driven Autoscaler) with a cron scaler to scale down non-critical workloads outside business hours. A staging cluster that runs at full capacity 24/7 is a common oversight — scheduled scale-to-zero can cut staging costs by 70%.
Results Across Our Client Portfolio
Applying this playbook to 8 production EKS clusters in 2024: average cost reduction of 41%, ranging from 28% to 57% depending on initial over-provisioning levels. Median time to implement: 6 weeks.

