Improve container boot time by lazy loading with SOCI

Managing a big sale event is not simple, and autoscaling is insufficient on its own because scaling takes a lot of time, mostly because it takes time for the hosts’ containers to boot up and process requests. It is just this problem that we are trying to solve today, and here is where the recent offering from AWS — “SOCI” can help us! Originally published here Booting Containers — before SOCI To speed up the container boot time, first, we look at some relevant container orchestration bits done by ECS. When a new ECS task is provisioned by AWS ECS Agent on the host either due to auto-scaling or scheduling… The snapshotter on the ecs task has to pull the container image onto the task first & once the image is downloaded & decompressed, then only the container is started. Factors affecting container boot-time After gaining a better understanding, two key points are evident here: Image size directly impacts the time taken to boot the containers. Image has to be first downloaded & decompressed before the container can be started — unnecessary blocking. Now, 1. Image size is influenced by application requirements and can be significantly reduced through optimization techniques like multi-stage build, etc. But, even with this reduced image size, it has to wait & download an image (could still be a large one) every time before a container can start serving the requests. It gets especially crucial in a sale when there’s a sudden spike in traffic & ECS decides to double the containers (200 -> 400). This may cause a lot of requests queuing & that’s not an experience we want for our customers. What’s worse? — we overprovision our resources as a safety net for this. Lazy Loading with SOCI - for faster boot-times So our focus today is on point 2; instead of waiting for the whole image to download & decompress first, we only pull some files out of the image from the registry & start the container with the available files while the rest can be lazy loaded in the background. In this non-blocking approach, the container is started a lot earlier than it was supposed to be. Shown by figure: Challenges for lazy loading But wait… we don’t know which file is present in which layer of the image & how to pull.. only some files?…

Apr 12, 2025 - 14:19
 0
Improve container boot time by lazy loading with SOCI

Managing a big sale event is not simple, and autoscaling is insufficient on its own because scaling takes a lot of time, mostly because it takes time for the hosts’ containers to boot up and process requests. It is just this problem that we are trying to solve today, and here is where the recent offering from AWS — “SOCI” can help us!

Originally published here

Booting Containers — before SOCI

To speed up the container boot time, first, we look at some relevant container orchestration bits done by ECS.

When a new ECS task is provisioned by AWS ECS Agent on the host either due to auto-scaling or scheduling… The snapshotter on the ecs task has to pull the container image onto the task first & once the image is downloaded & decompressed, then only the container is started.

ECS Agent provisioning a task which pulls image from ECR & starts container — traditional way

Factors affecting container boot-time

After gaining a better understanding, two key points are evident here:

  1. Image size directly impacts the time taken to boot the containers.

  2. Image has to be first downloaded & decompressed before the container can be started — unnecessary blocking.

Now, 1. Image size is influenced by application requirements and can be significantly reduced through optimization techniques like multi-stage build, etc. But, even with this reduced image size, it has to wait & download an image (could still be a large one) every time before a container can start serving the requests.

It gets especially crucial in a sale when there’s a sudden spike in traffic & ECS decides to double the containers (200 -> 400). This may cause a lot of requests queuing & that’s not an experience we want for our customers. What’s worse? — we overprovision our resources as a safety net for this.

Lazy Loading with SOCI - for faster boot-times

So our focus today is on point 2; instead of waiting for the whole image to download & decompress first, we only pull some files out of the image from the registry & start the container with the available files while the rest can be lazy loaded in the background. In this non-blocking approach, the container is started a lot earlier than it was supposed to be. Shown by figure:

load the container image asynchronously

Challenges for lazy loading

But wait…

we don’t know which file is present in which layer of the image

& how to pull.. only some files?…