Why Scale To Zero?
What's the point of scaling to zero? Seriously, I don't understand why anybody would want to consider this as a goal in their architecture as it complicates things for nearly zero benefit. Savings Are Close To Zero Let's take a microservice application with a database, a message broker, several frontend and backend containers. Some parts scale up easily, others are more complicated, but the point is that you need at least 1 instance of each component for your whole application to work properly. We also address fault tolerance by only using managed services and a Kubernetes cluster with multiple nodes. The costs could look like this: $ 1,700 Kubernetes Cluster (3 nodes) $ 100 Message Broker (as Service) $ 150 Database (as Service) Sum: $ 1,950 If this setup is enough for 10,000 users, we scale up for 1,000,000 users (Congratulations!) with this setup: $ 28,000 Kubernetes Cluster (50 nodes) $ 10,000 Message Broker (as Service) $ 15,000 Database (as Service) Sum: $ 53,000 ...and we can scale back down to $ 1,950 just like that. No architectural change is needed, just shutdown the additional instances we used for scaling up to 1,000,000 users and we save 96,3 % of our costs. Scaling to zero would only save an additional 3,7 %, which is equivalent to going from 960,000 users to 1,000,000 users. Did we think about this step the same way as we think about scaling from one to zero? Probably not, because it's too small - we rather focused on scaling up from 800,000 to 1,000,000 users or something like that. Keep The Architecture Simple Scaling up and down between the least number of instances and what's required for 1,000,000 users does not require any architectural change. Just keep the bare minimum of instances running and as soon as traffic goes up Kubernetes is able to (auto-) scale horizontally. But scaling to zero is a different kind of beast. It means you really shutdown components completely and need to wake them up somehow, e.g. by issuing an API call to Kubernetes to start the very 1st container of your frontend. This requires additional tooling or additional components. Think about the following insane approach of scaling to zero: There's a worker queue in your message broker with a task waiting to be processed, but since you scaled down your worker containers to zero you now need to implement some kind of monitor that checks all queue for any waiting messages, which is then able to scale up the corresponding container from zero to one. Please don't even think about shutting down your message broker if your monitor script finds all queues empty. I already heard people suggest building another message layer for such events just to inform another additional, always-running component that it now hast to start the container with the "real" message broker. What About Test Environments? You don't want a different architecture in your test environment than in your production environment! So, you don't want to have additional components just to be able to scale to zero. If really nobody is using your test environment for quite a while, just shutdown it down completely and start it up manually as soon as you need it. What About Infrequent Usage? Do you mean user-driven infrequent usage? I'd say your minimal setup is still the best way to go. If the number of users is so small that they don't cover your server costs you better think about a way to attract more users instead of implementing scale-to-zero. And what about cronjobs that only run once a month? If this is the only usage of your application you can call yourself lucky, because then you can exactly plan your usage. Just startup and shutdown your environment in the same cronjob. Notice that a planned shutdown is not what it means for an application to be able to (automatically!) scale-to-zero. Do I Have A Clue? Not really, so I just googled "scale to zero" and found this guy on Reddit, who explores options for scaling to zero in Kubernetes. The very first questions he asks: What [additional] tools are required? What do they cost? That's exactly what I'm complaining about! No, it's worse: not only do these additional tools mess up your beautiful, simple architecture, they also eat up the tiny savings you could've achieved if they're not for free. tl;dr Please don't waste your time trying to make your application scalable to zero.

What's the point of scaling to zero? Seriously, I don't understand why anybody would want to consider this as a goal in their architecture as it complicates things for nearly zero benefit.
Savings Are Close To Zero
Let's take a microservice application with a database, a message broker, several frontend and backend containers. Some parts scale up easily, others are more complicated, but the point is that you need at least 1 instance of each component for your whole application to work properly. We also address fault tolerance by only using managed services and a Kubernetes cluster with multiple nodes.
The costs could look like this:
- $ 1,700 Kubernetes Cluster (3 nodes)
- $ 100 Message Broker (as Service)
- $ 150 Database (as Service)
Sum: $ 1,950
If this setup is enough for 10,000 users, we scale up for 1,000,000 users (Congratulations!) with this setup:
- $ 28,000 Kubernetes Cluster (50 nodes)
- $ 10,000 Message Broker (as Service)
- $ 15,000 Database (as Service)
Sum: $ 53,000
...and we can scale back down to $ 1,950 just like that. No architectural change is needed, just shutdown the additional instances we used for scaling up to 1,000,000 users and we save 96,3 % of our costs.
Scaling to zero would only save an additional 3,7 %, which is equivalent to going from 960,000 users to 1,000,000 users. Did we think about this step the same way as we think about scaling from one to zero? Probably not, because it's too small - we rather focused on scaling up from 800,000 to 1,000,000 users or something like that.
Keep The Architecture Simple
Scaling up and down between the least number of instances and what's required for 1,000,000 users does not require any architectural change. Just keep the bare minimum of instances running and as soon as traffic goes up Kubernetes is able to (auto-) scale horizontally. But scaling to zero is a different kind of beast. It means you really shutdown components completely and need to wake them up somehow, e.g. by issuing an API call to Kubernetes to start the very 1st container of your frontend. This requires additional tooling or additional components. Think about the following insane approach of scaling to zero:
- There's a worker queue in your message broker with a task waiting to be processed, but since you scaled down your worker containers to zero you now need to implement some kind of monitor that checks all queue for any waiting messages, which is then able to scale up the corresponding container from zero to one.
Please don't even think about shutting down your message broker if your monitor script finds all queues empty. I already heard people suggest building another message layer for such events just to inform another additional, always-running component that it now hast to start the container with the "real" message broker.
What About Test Environments?
You don't want a different architecture in your test environment than in your production environment! So, you don't want to have additional components just to be able to scale to zero. If really nobody is using your test environment for quite a while, just shutdown it down completely and start it up manually as soon as you need it.
What About Infrequent Usage?
Do you mean user-driven infrequent usage? I'd say your minimal setup is still the best way to go. If the number of users is so small that they don't cover your server costs you better think about a way to attract more users instead of implementing scale-to-zero.
And what about cronjobs that only run once a month? If this is the only usage of your application you can call yourself lucky, because then you can exactly plan your usage. Just startup and shutdown your environment in the same cronjob. Notice that a planned shutdown is not what it means for an application to be able to (automatically!) scale-to-zero.
Do I Have A Clue?
Not really, so I just googled "scale to zero" and found this guy on Reddit, who explores options for scaling to zero in Kubernetes. The very first questions he asks:
- What [additional] tools are required?
- What do they cost?
That's exactly what I'm complaining about! No, it's worse: not only do these additional tools mess up your beautiful, simple architecture, they also eat up the tiny savings you could've achieved if they're not for free.
tl;dr
Please don't waste your time trying to make your application scalable to zero.