In-Depth Guide to Docker Images
Leapcell: The Best of Serverless Web Hosting In-depth Analysis of Docker Images I. Overview of Docker Images As the foundation of containers, Docker images essentially represent the content of the container's file system. It is a read-only template used to create Docker containers. From a technical perspective, Docker images adopt a layered structure design. Except for the base image, other images are generated by overlaying new content on top of existing images. The metadata of each layer of the image is stored in a json file. This metadata not only describes the static content of the file system but also contains dynamic data information, such as the creation time of the image, build instructions, and so on. 1.1 Building Docker Images Docker images are usually built through a Dockerfile. A Dockerfile is a text file that contains a series of instructions for defining the base environment of the image, installing software packages, copying files, and other operations. During the building process, Docker will create each layer of the image step by step according to the order of the instructions in the Dockerfile. When each instruction is executed, a new image layer will be generated, and these layers will be cached for reuse in subsequent builds. 1.2 Methods to Improve Image Building Efficiency Make Rational Use of the Caching Mechanism: When building an image, Docker will check whether the image layer generated by the current instruction already exists in the local cache. If it exists and meets certain conditions (such as the same instruction, unchanged file content, etc.), the cached image layer will be directly used to avoid repeated building. For example, when executing the RUN instruction, if the execution result of this instruction is the same as the previously cached result (judged by comparing the instruction and changes in the file system, etc.), the image layer will be reused. Optimize the Structure of the Dockerfile: Put the instructions that do not change frequently (such as installing basic software packages) at the front. In this way, when the Dockerfile is modified later, as long as these basic instructions remain unchanged, the existing image layers can be reused, reducing the building time. Build in Layers: Split the complex building process into multiple stages, and each stage is only responsible for a specific task. For example, when building an image that contains a compilation process, the compilation can be completed in one stage first, and then the compiled result can be copied into the final image in another stage, which can reduce the size of the final image. II. Commands Related to Docker Images The Docker client provides a rich set of commands to interact with the Docker daemon to complete various tasks related to images: List Images: The docker images command is used to list all the images on the Docker host. You can use the -f parameter for filtering. For example, docker images -f dangling=true can list all the images without tags. Build Images: The docker build command builds a new image from a Dockerfile. For example, docker build -t myimage:latest., where -t is used to specify the tag of the image, and . indicates the Dockerfile in the current directory. View Image History: The docker history command can list the build history of a certain image, showing information such as the creation time of each layer of the image, the executed instructions, and the size of the image. Import Images: The docker import creates a new file system image from a tarball file. Pull Images: The docker pull pulls the specified image from the Docker image registry to the local. Push Images: The docker push pushes the local image to the specified image registry. Delete Images: The docker rmi is used to delete local images. If an image is referenced by multiple tags, you need to remove all tags first or use the -f parameter to force deletion. Save Images: The docker save saves the image as a tar file, which is convenient for migrating the image in different environments. Search Images: The docker search searches for images that meet the conditions on Docker Hub. Tag Images: The docker tag tags the image, which is convenient for version management and identification of the image. III. The Download Process of Docker Images (pull Operation) Docker adopts a typical C/S (Client/Server) architecture. Client commands such as docker pull will eventually be sent to the Docker daemon (server side) for processing. When the docker pull is executed, the specific process is as follows: The Docker client organizes the configuration and parameters and sends the pull instruction to the Docker server. After the server side receives the instruction, it hands it over to the corresponding handler. The handler will start a CmdPull task, which has been registered when the Docker daemon is started. According to the incoming image regi

Leapcell: The Best of Serverless Web Hosting
In-depth Analysis of Docker Images
I. Overview of Docker Images
As the foundation of containers, Docker images essentially represent the content of the container's file system. It is a read-only template used to create Docker containers. From a technical perspective, Docker images adopt a layered structure design. Except for the base image, other images are generated by overlaying new content on top of existing images. The metadata of each layer of the image is stored in a json
file. This metadata not only describes the static content of the file system but also contains dynamic data information, such as the creation time of the image, build instructions, and so on.
1.1 Building Docker Images
Docker images are usually built through a Dockerfile
. A Dockerfile
is a text file that contains a series of instructions for defining the base environment of the image, installing software packages, copying files, and other operations. During the building process, Docker will create each layer of the image step by step according to the order of the instructions in the Dockerfile
. When each instruction is executed, a new image layer will be generated, and these layers will be cached for reuse in subsequent builds.
1.2 Methods to Improve Image Building Efficiency
-
Make Rational Use of the Caching Mechanism: When building an image, Docker will check whether the image layer generated by the current instruction already exists in the local cache. If it exists and meets certain conditions (such as the same instruction, unchanged file content, etc.), the cached image layer will be directly used to avoid repeated building. For example, when executing the
RUN
instruction, if the execution result of this instruction is the same as the previously cached result (judged by comparing the instruction and changes in the file system, etc.), the image layer will be reused. -
Optimize the Structure of the Dockerfile: Put the instructions that do not change frequently (such as installing basic software packages) at the front. In this way, when the
Dockerfile
is modified later, as long as these basic instructions remain unchanged, the existing image layers can be reused, reducing the building time. - Build in Layers: Split the complex building process into multiple stages, and each stage is only responsible for a specific task. For example, when building an image that contains a compilation process, the compilation can be completed in one stage first, and then the compiled result can be copied into the final image in another stage, which can reduce the size of the final image.
II. Commands Related to Docker Images
The Docker client provides a rich set of commands to interact with the Docker daemon to complete various tasks related to images:
-
List Images: The
docker images
command is used to list all the images on the Docker host. You can use the-f
parameter for filtering. For example,docker images -f dangling=true
can list all the images without tags. -
Build Images: The
docker build
command builds a new image from aDockerfile
. For example,docker build -t myimage:latest.
, where-t
is used to specify the tag of the image, and.
indicates theDockerfile
in the current directory. -
View Image History: The
docker history
command can list the build history of a certain image, showing information such as the creation time of each layer of the image, the executed instructions, and the size of the image. -
Import Images: The
docker import
creates a new file system image from atarball
file. -
Pull Images: The
docker pull
pulls the specified image from the Docker image registry to the local. -
Push Images: The
docker push
pushes the local image to the specified image registry. -
Delete Images: The
docker rmi
is used to delete local images. If an image is referenced by multiple tags, you need to remove all tags first or use the-f
parameter to force deletion. -
Save Images: The
docker save
saves the image as atar
file, which is convenient for migrating the image in different environments. -
Search Images: The
docker search
searches for images that meet the conditions on Docker Hub. -
Tag Images: The
docker tag
tags the image, which is convenient for version management and identification of the image.
III. The Download Process of Docker Images (pull Operation)
Docker adopts a typical C/S (Client/Server) architecture. Client commands such as docker pull
will eventually be sent to the Docker daemon (server side) for processing. When the docker pull
is executed, the specific process is as follows:
- The Docker client organizes the configuration and parameters and sends the
pull
instruction to the Docker server. - After the server side receives the instruction, it hands it over to the corresponding handler. The handler will start a
CmdPull
task, which has been registered when the Docker daemon is started. - According to the incoming image registry address (registry address), repository name (repo name), image name, and tag (tag), the Docker daemon finds and downloads the image through the following steps:
- Get all the image IDs under the repository: Through the
GET /repositories/{repo}/images
interface. - Get the information of all tags under the repository: Through the
GET /repositories/{repo}/tags
interface. - Find the corresponding image UUID according to the tag and download the image.
- Get the historical information of the image and download these image layers one by one: Through the
GET /images/{image_id}/ancestry
interface. If the image layer already exists locally, the download will be skipped; if not, the download will continue. - Get the
json
information of the image layer: Through theGET /images/{image_id}/json
interface. - Download the image content: Through the
GET /images/{image_id}/layer
interface.
- Get all the image IDs under the repository: Through the
- After the download is completed, store the image content in the local UnionFS (Union File System), and add the information of the newly downloaded image to the TagStore.
IV. Storage of Docker Images
4.1 UnionFS and aufs
UnionFS is the basis for Docker to implement hierarchical images. It is a file system service that supports transparently overlaying multiple branches of file systems on systems such as Linux, FreeBSD, and NetBSD to form a unified file system. In Docker, images are stored in a layered form. The application layer sees a complete file system, while the underlying layer manages the content and relationships of each image layer through UnionFS.
aufs
(Another UnionFS) is one of the commonly used storage drivers in Docker. In addition, there are devicemapper
and others. Users can choose an appropriate storage driver according to their needs, or even implement their own driver.
4.2 The Storage Structure of aufs Images
Take the ubuntu:20.04
image as an example (assuming the current Docker version is 20.10.0 and the image driver is aufs
), use docker history
to view the image history:
$ docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
myregistry/ubuntu 20.04 8b24f7a1cb23 2 months ago 256.3 MB
$ docker history 8b24
IMAGE CREATED CREATED BY SIZE
8b24f7a1cb23 2 months ago /bin/sh -c #(nop) CMD ["bash"] 0 B
b17ee223aa89 2 months ago /bin/sh -c sed -i 's/^#\s*\(deb.*universe\)$/ 1.9 kB
c18294cc5170 2 months ago /bin/sh -c echo '#!/bin/sh' > /usr/sbin/polic 195.5 kB
d4fd76b09ce9 2 months ago /bin/sh -c #(nop) ADD file:0018ff77d038472f52 256.1 MB
511136ea3c5a 3 years ago 0 B
It can be seen that the ubuntu:20.04
image contains multiple layers. The aufs
data is stored in the /var/lib/docker/aufs
directory, which contains three main folders:
- layers: Records which layers each image consists of.
- diff: Stores the difference content between each image and the previous image, that is, the actual data of the current image layer.
- mnt: As the mount point provided by the UnionFS to the outside, each running container has a corresponding folder in this directory, which is used to provide a unified file access interface.
In addition, Docker also saves the metadata in json
format for each image layer, which is stored in /var/lib/docker/graph/
, for example:
{
"id": "8b24f7a1cb23146e20erewtewtertewrwc0f82943f4ab8c097e7",
"parent": "b17ee223aa89d1b136ea55eqweqweqwrewra6c88d93e1ad7c",
"created": "2024-12-21T02:11:06.735146646Z",
"container": "c9a3eda5951d28aa8dbe5qwrqwrewrtw886d0a8e7a710132a38ec",
"container_config": {
"Hostname": "43bd710ec89a",
"Domainname": "",
"User": "",
"Memory": 0,
"MemorySwap": 0,
"CpuShares": 0,
"Cpuset": "",
"AttachStdin": false,
"AttachStdout": false,
"AttachStderr": false,
"PortSpecs": null,
"ExposedPorts": null,
"Tty": false,
"OpenStdin": false,
"StdinOnce": false,
"Env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
],
"Cmd": [
"/bin/sh",
"-c",
"#(nop) CMD ["bash"]"
],
"Image": "b17ee223aa89d1b136ea55e4421f4ce413dfc6c0cc6b2186dea6c88d93e1ad7c",
"Volumes": null,
"WorkingDir": "",
"Entrypoint": null,
"NetworkDisabled": false,
"MacAddress": "",
"OnBuild": [],
"Labels": null
},
"docker_version": "20.10.0",
"config": {
"Hostname": "43bd710ec89a",
"Domainname": "",
"User": "",
"Memory": 0,
"MemorySwap": 0,
"CpuShares": 0,
"Cpuset": "",
"AttachStdin": false,
"AttachStdout": false,
"AttachStderr": false,
"PortSpecs": null,
"ExposedPorts": null,
"Tty": false,
"OpenStdin": false,
"StdinOnce": false,
"Env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
],
"Cmd": [
"bash"
],
"Image": "b17ee223aa89qwrewtretgertwerewrq6dea6c88d93e1ad7c",
"Volumes": null,
"WorkingDir": "",
"Entrypoint": null,
"NetworkDisabled": false,
"MacAddress": "",
"OnBuild": [],
"Labels": null
},
"architecture": "amd64",
"os": "linux",
"Size": 0
}
At the same time, the /var/lib/docker/graph/
file saves the size information of the image layer.
V. The Creation and Caching Mechanism of Docker Images
When using docker build
to create an image, Docker will use the caching mechanism to improve the building efficiency. Take the following Dockerfile
as an example:
FROM ubuntu:20.04
RUN apt-get update
ADD run.sh /
VOLUME /data
CMD ["./run.sh"]
During the building process, Docker will execute according to the order of the instructions:
-
Process the
FROM
Instruction: The Docker daemon first looks for theubuntu:20.04
image locally. If it does not exist, it will pull it from the image registry and obtain itsjson
file containing metadata. -
Process the
RUN
Instruction: If there is no cache available, this instruction executesapt-get update
, and the changes in the file system (such as the updated software package list, etc.) will be saved in the/var/lib/docker/aufs/diff/
directory. At the same time, the/ container_config.Cmd
field in thejson
file will record the executed instruction. When building next time, if theparent
of the new image layer is stillubuntu:20.04
and the content to be changed in thecmd
of thejson
file is the same, it is considered that the image layers are the same, and it will be directly reused without having to rebuild. -
Process the
ADD
andCOPY
Instructions: For theADD
orCOPY
commands, Docker determines whether the images are the same by calculating the hash value of the file. In thejson
file, theCmd
field corresponding to theADD
instruction will record the hash string of the file. Only when the file content, file name, etc. are completely the same will the image layer be reused.
However, the caching mechanism has limitations. For commands that rely on external resources (such as apt-get update
to obtain updates from external software sources, curl
to download external files, etc.), if the external content changes, Docker cannot automatically detect it. At this time, you can use the --no-cache
parameter to force the disabling of the cache and rebuild the image. Therefore, when writing a Dockerfile
, developers need to fully consider the caching mechanism and follow the best practices provided by the official to ensure the accuracy and efficiency of image building.
VI. The Relationship between Docker Images and Containers
Docker containers are running instances of images. Images contain static file system content, and containers add dynamic runtime states on this basis. The relevant information during the running of the container (except for the content of the file system) is stored in the json
file of the image. For example:
-
Environment Variables: Such as
ENV FOO=BAR
, which defines the environment variables during the running of the container. -
Data Volumes: The container data volumes declared by
VOLUME /some/path
are dynamically added during the running of the container and are not the fixed content of the image layer. -
Exposed Ports:
EXPOSE 80
records the ports that the container needs to expose to the outside during running. -
Execution Entry:
CMD ["./myscript.sh"]
defines the command to be executed when the container starts.
When starting a container, the Docker daemon reads the image information as the root file system (rootfs) of the container, and at the same time reads the dynamic information in the json
file to configure the runtime state of the container. Each running container is a child process of the Docker daemon, and the Docker daemon is responsible for managing the life cycle and resource allocation of the container.
VII. Deletion of Docker Images
Images are stored locally in the UnionFS format, and the docker rmi
command can be used to delete images. The following points need to be noted when deleting:
-
Image Reference Relationship: There is a concept of "reference" for images, that is, an image can be referenced by multiple tags. When deleting an image with tags, the tags will be removed first (untag operation). If the image is still referenced by other tags, all tags must be deleted first, or the
-f
parameter can be used to force deletion. - Deletion of Multi-layer Images: If an image contains multiple layers and the middle layers are not referenced by other images, when deleting this image, all the unreferenced image layers will be deleted together.
Leapcell: The Best of Serverless Web Hosting
Finally, I would like to recommend a platform that is most suitable for deploying web services: Leapcell