In-Depth Guide to Docker Images

Leapcell: The Best of Serverless Web Hosting In-depth Analysis of Docker Images I. Overview of Docker Images As the foundation of containers, Docker images essentially represent the content of the container's file system. It is a read-only template used to create Docker containers. From a technical perspective, Docker images adopt a layered structure design. Except for the base image, other images are generated by overlaying new content on top of existing images. The metadata of each layer of the image is stored in a json file. This metadata not only describes the static content of the file system but also contains dynamic data information, such as the creation time of the image, build instructions, and so on. 1.1 Building Docker Images Docker images are usually built through a Dockerfile. A Dockerfile is a text file that contains a series of instructions for defining the base environment of the image, installing software packages, copying files, and other operations. During the building process, Docker will create each layer of the image step by step according to the order of the instructions in the Dockerfile. When each instruction is executed, a new image layer will be generated, and these layers will be cached for reuse in subsequent builds. 1.2 Methods to Improve Image Building Efficiency Make Rational Use of the Caching Mechanism: When building an image, Docker will check whether the image layer generated by the current instruction already exists in the local cache. If it exists and meets certain conditions (such as the same instruction, unchanged file content, etc.), the cached image layer will be directly used to avoid repeated building. For example, when executing the RUN instruction, if the execution result of this instruction is the same as the previously cached result (judged by comparing the instruction and changes in the file system, etc.), the image layer will be reused. Optimize the Structure of the Dockerfile: Put the instructions that do not change frequently (such as installing basic software packages) at the front. In this way, when the Dockerfile is modified later, as long as these basic instructions remain unchanged, the existing image layers can be reused, reducing the building time. Build in Layers: Split the complex building process into multiple stages, and each stage is only responsible for a specific task. For example, when building an image that contains a compilation process, the compilation can be completed in one stage first, and then the compiled result can be copied into the final image in another stage, which can reduce the size of the final image. II. Commands Related to Docker Images The Docker client provides a rich set of commands to interact with the Docker daemon to complete various tasks related to images: List Images: The docker images command is used to list all the images on the Docker host. You can use the -f parameter for filtering. For example, docker images -f dangling=true can list all the images without tags. Build Images: The docker build command builds a new image from a Dockerfile. For example, docker build -t myimage:latest., where -t is used to specify the tag of the image, and . indicates the Dockerfile in the current directory. View Image History: The docker history command can list the build history of a certain image, showing information such as the creation time of each layer of the image, the executed instructions, and the size of the image. Import Images: The docker import creates a new file system image from a tarball file. Pull Images: The docker pull pulls the specified image from the Docker image registry to the local. Push Images: The docker push pushes the local image to the specified image registry. Delete Images: The docker rmi is used to delete local images. If an image is referenced by multiple tags, you need to remove all tags first or use the -f parameter to force deletion. Save Images: The docker save saves the image as a tar file, which is convenient for migrating the image in different environments. Search Images: The docker search searches for images that meet the conditions on Docker Hub. Tag Images: The docker tag tags the image, which is convenient for version management and identification of the image. III. The Download Process of Docker Images (pull Operation) Docker adopts a typical C/S (Client/Server) architecture. Client commands such as docker pull will eventually be sent to the Docker daemon (server side) for processing. When the docker pull is executed, the specific process is as follows: The Docker client organizes the configuration and parameters and sends the pull instruction to the Docker server. After the server side receives the instruction, it hands it over to the corresponding handler. The handler will start a CmdPull task, which has been registered when the Docker daemon is started. According to the incoming image regi

Apr 26, 2025 - 05:57

Leapcell: The Best of Serverless Web Hosting

In-depth Analysis of Docker Images

I. Overview of Docker Images

As the foundation of containers, Docker images essentially represent the content of the container's file system. It is a read-only template used to create Docker containers. From a technical perspective, Docker images adopt a layered structure design. Except for the base image, other images are generated by overlaying new content on top of existing images. The metadata of each layer of the image is stored in a json file. This metadata not only describes the static content of the file system but also contains dynamic data information, such as the creation time of the image, build instructions, and so on.

1.1 Building Docker Images

Docker images are usually built through a Dockerfile. A Dockerfile is a text file that contains a series of instructions for defining the base environment of the image, installing software packages, copying files, and other operations. During the building process, Docker will create each layer of the image step by step according to the order of the instructions in the Dockerfile. When each instruction is executed, a new image layer will be generated, and these layers will be cached for reuse in subsequent builds.

1.2 Methods to Improve Image Building Efficiency

Make Rational Use of the Caching Mechanism: When building an image, Docker will check whether the image layer generated by the current instruction already exists in the local cache. If it exists and meets certain conditions (such as the same instruction, unchanged file content, etc.), the cached image layer will be directly used to avoid repeated building. For example, when executing the RUN instruction, if the execution result of this instruction is the same as the previously cached result (judged by comparing the instruction and changes in the file system, etc.), the image layer will be reused.
Optimize the Structure of the Dockerfile: Put the instructions that do not change frequently (such as installing basic software packages) at the front. In this way, when the Dockerfile is modified later, as long as these basic instructions remain unchanged, the existing image layers can be reused, reducing the building time.
Build in Layers: Split the complex building process into multiple stages, and each stage is only responsible for a specific task. For example, when building an image that contains a compilation process, the compilation can be completed in one stage first, and then the compiled result can be copied into the final image in another stage, which can reduce the size of the final image.

II. Commands Related to Docker Images

The Docker client provides a rich set of commands to interact with the Docker daemon to complete various tasks related to images:

List Images: The docker images command is used to list all the images on the Docker host. You can use the -f parameter for filtering. For example, docker images -f dangling=true can list all the images without tags.
Build Images: The docker build command builds a new image from a Dockerfile. For example, docker build -t myimage:latest., where -t is used to specify the tag of the image, and . indicates the Dockerfile in the current directory.
View Image History: The docker history command can list the build history of a certain image, showing information such as the creation time of each layer of the image, the executed instructions, and the size of the image.
Import Images: The docker import creates a new file system image from a tarball file.
Pull Images: The docker pull pulls the specified image from the Docker image registry to the local.
Push Images: The docker push pushes the local image to the specified image registry.
Delete Images: The docker rmi is used to delete local images. If an image is referenced by multiple tags, you need to remove all tags first or use the -f parameter to force deletion.
Save Images: The docker save saves the image as a tar file, which is convenient for migrating the image in different environments.
Search Images: The docker search searches for images that meet the conditions on Docker Hub.
Tag Images: The docker tag tags the image, which is convenient for version management and identification of the image.

III. The Download Process of Docker Images (pull Operation)

Docker adopts a typical C/S (Client/Server) architecture. Client commands such as docker pull will eventually be sent to the Docker daemon (server side) for processing. When the docker pull is executed, the specific process is as follows:

The Docker client organizes the configuration and parameters and sends the pull instruction to the Docker server.
After the server side receives the instruction, it hands it over to the corresponding handler. The handler will start a CmdPull task, which has been registered when the Docker daemon is started.
According to the incoming image registry address (registry address), repository name (repo name), image name, and tag (tag), the Docker daemon finds and downloads the image through the following steps:
- Get all the image IDs under the repository: Through the GET /repositories/{repo}/images interface.
- Get the information of all tags under the repository: Through the GET /repositories/{repo}/tags interface.
- Find the corresponding image UUID according to the tag and download the image.
- Get the historical information of the image and download these image layers one by one: Through the GET /images/{image_id}/ancestry interface. If the image layer already exists locally, the download will be skipped; if not, the download will continue.
- Get the json information of the image layer: Through the GET /images/{image_id}/json interface.
- Download the image content: Through the GET /images/{image_id}/layer interface.
After the download is completed, store the image content in the local UnionFS (Union File System), and add the information of the newly downloaded image to the TagStore.

IV. Storage of Docker Images

4.1 UnionFS and aufs

UnionFS is the basis for Docker to implement hierarchical images. It is a file system service that supports transparently overlaying multiple branches of file systems on systems such as Linux, FreeBSD, and NetBSD to form a unified file system. In Docker, images are stored in a layered form. The application layer sees a complete file system, while the underlying layer manages the content and relationships of each image layer through UnionFS.

aufs (Another UnionFS) is one of the commonly used storage drivers in Docker. In addition, there are devicemapper and others. Users can choose an appropriate storage driver according to their needs, or even implement their own driver.

4.2 The Storage Structure of aufs Images

Take the ubuntu:20.04 image as an example (assuming the current Docker version is 20.10.0 and the image driver is aufs), use docker history to view the image history:

$ docker images
REPOSITORY                TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
myregistry/ubuntu         20.04               8b24f7a1cb23        2 months ago        256.3 MB
$ docker history 8b24
IMAGE               CREATED              CREATED BY                                      SIZE
8b24f7a1cb23        2 months ago            /bin/sh -c #(nop)  CMD ["bash"]                0 B
b17ee223aa89        2 months ago            /bin/sh -c sed -i 's/^#\s*\(deb.*universe\)$/   1.9 kB
c18294cc5170        2 months ago            /bin/sh -c echo '#!/bin/sh' > /usr/sbin/polic   195.5 kB
d4fd76b09ce9        2 months ago            /bin/sh -c #(nop) ADD file:0018ff77d038472f52   256.1 MB
511136ea3c5a        3 years ago                                                          0 B

It can be seen that the ubuntu:20.04 image contains multiple layers. The aufs data is stored in the /var/lib/docker/aufs directory, which contains three main folders:

layers: Records which layers each image consists of.
diff: Stores the difference content between each image and the previous image, that is, the actual data of the current image layer.
mnt: As the mount point provided by the UnionFS to the outside, each running container has a corresponding folder in this directory, which is used to provide a unified file access interface.

In addition, Docker also saves the metadata in json format for each image layer, which is stored in /var/lib/docker/graph//json, for example:

{
  "id": "8b24f7a1cb23146e20erewtewtertewrwc0f82943f4ab8c097e7",
  "parent": "b17ee223aa89d1b136ea55eqweqweqwrewra6c88d93e1ad7c",
  "created": "2024-12-21T02:11:06.735146646Z",
  "container": "c9a3eda5951d28aa8dbe5qwrqwrewrtw886d0a8e7a710132a38ec",
  "container_config": {
    "Hostname": "43bd710ec89a",
    "Domainname": "",
    "User": "",
    "Memory": 0,
    "MemorySwap": 0,
    "CpuShares": 0,
    "Cpuset": "",
    "AttachStdin": false,
    "AttachStdout": false,
    "AttachStderr": false,
    "PortSpecs": null,
    "ExposedPorts": null,
    "Tty": false,
    "OpenStdin": false,
    "StdinOnce": false,
    "Env": [
      "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
    ],
    "Cmd": [
      "/bin/sh",
      "-c",
      "#(nop)  CMD ["bash"]"
    ],
    "Image": "b17ee223aa89d1b136ea55e4421f4ce413dfc6c0cc6b2186dea6c88d93e1ad7c",
    "Volumes": null,
    "WorkingDir": "",
    "Entrypoint": null,
    "NetworkDisabled": false,
    "MacAddress": "",
    "OnBuild": [],
    "Labels": null
  },
  "docker_version": "20.10.0",
  "config": {
    "Hostname": "43bd710ec89a",
    "Domainname": "",
    "User": "",
    "Memory": 0,
    "MemorySwap": 0,
    "CpuShares": 0,
    "Cpuset": "",
    "AttachStdin": false,
    "AttachStdout": false,
    "AttachStderr": false,
    "PortSpecs": null,
    "ExposedPorts": null,
    "Tty": false,
    "OpenStdin": false,
    "StdinOnce": false,
    "Env": [
      "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
    ],
    "Cmd": [
      "bash"
    ],
    "Image": "b17ee223aa89qwrewtretgertwerewrq6dea6c88d93e1ad7c",
    "Volumes": null,
    "WorkingDir": "",
    "Entrypoint": null,
    "NetworkDisabled": false,
    "MacAddress": "",
    "OnBuild": [],
    "Labels": null
  },
  "architecture": "amd64",
  "os": "linux",
  "Size": 0
}

At the same time, the /var/lib/docker/graph//layersize file saves the size information of the image layer.

V. The Creation and Caching Mechanism of Docker Images

When using docker build to create an image, Docker will use the caching mechanism to improve the building efficiency. Take the following Dockerfile as an example:

FROM ubuntu:20.04
RUN apt-get update
ADD run.sh /
VOLUME /data
CMD ["./run.sh"]

During the building process, Docker will execute according to the order of the instructions:

Process the FROM Instruction: The Docker daemon first looks for the ubuntu:20.04 image locally. If it does not exist, it will pull it from the image registry and obtain its json file containing metadata.
Process the RUN Instruction: If there is no cache available, this instruction executes apt-get update, and the changes in the file system (such as the updated software package list, etc.) will be saved in the /var/lib/docker/aufs/diff// directory. At the same time, the container_config.Cmd field in the json file will record the executed instruction. When building next time, if the parent of the new image layer is still ubuntu:20.04 and the content to be changed in the cmd of the json file is the same, it is considered that the image layers are the same, and it will be directly reused without having to rebuild.
Process the ADD and COPY Instructions: For the ADD or COPY commands, Docker determines whether the images are the same by calculating the hash value of the file. In the json file, the Cmd field corresponding to the ADD instruction will record the hash string of the file. Only when the file content, file name, etc. are completely the same will the image layer be reused.

However, the caching mechanism has limitations. For commands that rely on external resources (such as apt-get update to obtain updates from external software sources, curl to download external files, etc.), if the external content changes, Docker cannot automatically detect it. At this time, you can use the --no-cache parameter to force the disabling of the cache and rebuild the image. Therefore, when writing a Dockerfile, developers need to fully consider the caching mechanism and follow the best practices provided by the official to ensure the accuracy and efficiency of image building.

VI. The Relationship between Docker Images and Containers

Docker containers are running instances of images. Images contain static file system content, and containers add dynamic runtime states on this basis. The relevant information during the running of the container (except for the content of the file system) is stored in the json file of the image. For example:

Environment Variables: Such as ENV FOO=BAR, which defines the environment variables during the running of the container.
Data Volumes: The container data volumes declared by VOLUME /some/path are dynamically added during the running of the container and are not the fixed content of the image layer.
Exposed Ports: EXPOSE 80 records the ports that the container needs to expose to the outside during running.
Execution Entry: CMD ["./myscript.sh"] defines the command to be executed when the container starts.

When starting a container, the Docker daemon reads the image information as the root file system (rootfs) of the container, and at the same time reads the dynamic information in the json file to configure the runtime state of the container. Each running container is a child process of the Docker daemon, and the Docker daemon is responsible for managing the life cycle and resource allocation of the container.

VII. Deletion of Docker Images

Images are stored locally in the UnionFS format, and the docker rmi command can be used to delete images. The following points need to be noted when deleting:

Image Reference Relationship: There is a concept of "reference" for images, that is, an image can be referenced by multiple tags. When deleting an image with tags, the tags will be removed first (untag operation). If the image is still referenced by other tags, all tags must be deleted first, or the -f parameter can be used to force deletion.
Deletion of Multi-layer Images: If an image contains multiple layers and the middle layers are not referenced by other images, when deleting this image, all the unreferenced image layers will be deleted together.

Leapcell: The Best of Serverless Web Hosting

Finally, I would like to recommend a platform that is most suitable for deploying web services: Leapcell