Root /ArchiveAbout

Docker Build: The Ultimate Guide on Building Docker Images [2022]

Docker Build: The Ultimate Guide on Building Docker Images [2022]
Docker Build: The Ultimate Guide on Building Docker Images [2022]

In this guide we will look at the docker build command in depth. The article have the following topics to highlight:

  • Docker images creation
  • Using base images to create your own
  • Multistage builds
  • Docker image size and build time optimizations
  • Image names and tags
  • Testing the image
  • Publishing to docker public and private registries
  • Debugging docker manifest
  • Purge (cleaning up) old and unused images
  • Running a container from built image

What is docker building with build command?

Docker image is a snapshot of the operating system (usually Linux) created to run in an isolated environment called container. Layer is a file system slice of directories and files which stacks one on top of another to create images.

Further research is hard to follow without examples, so we will create an experimental image to check different features and will keep it minimal for each feature review.

Docker is building images from a series of layers. File usually called Dockerfile contains lines with instructions to build layers of the image. Each line in the Dockerfile represents one instruction.

# copies ./src from your machine inside /app directory of the image
COPY ./src /app

# execute the install `bash` package command and save resulting files to the image
RUN apk add --no-cache bash

If instruction changes the existing files from previous layers it will cause docker to create a new layer with modified version of the file. Also new layer will be created if instruction results in adding or deleting files to the filesystem.

# Will create new layer
COPY . /base

# Will NOT create new layer
RUN echo 'Copied the app, hurray!'

This strategy of adding a new layer on file change is called "Copy on write" or CoW. If there are multiple instructions which are changing the filesystem by modifying files or adding new for every instruction a new layer will be created.

# Will add new layer
COPY . /app
# Will add a new layer from built files
RUN make /app
# Not modifying anything, no new layer created
RUN echo 'Build completed, cache files are stored in /app/.cache directory'
# Will create a new layer but /app/.cache will still be stored on previous layer
RUN rm -rf /app/.cache

Lets call the active layer "current" and previous layers "finished". It's an important distinction, because all finished layers are immutable (read-only) and only current layer has write ability. It's now much easier to understand the limitation of CoW strategy: docker does literally not remove files from finished layers after once they become immutable. If you add files in one layer and remove them in the next, then this files will be absent but still increase the size of the image. So why it's important to pipeline delete file operations to the instruction where they was created.

# Will add new layer
COPY . /app
# Will add a new layer from built files and remove the cache files from it at once
RUN make /app && rm -rf /app/.cache

# Not modifying anything, no new layer created
RUN echo 'Build completed, cache files are removed and not stored in any layer'

How docker create layers when building an image

Set of instructions in the Dockerfile executes in order, line by line and creating new layers on every instrution changing files. When we start the build process with command docker build the first instruction will be read and docker will create a temporary container to execute it.

While there is no previous instructions, the layer stack is empty and the filesystem will be clean. After the execution of the instruction finishes docker will check the filesystem for changed files the same way as git is checking the changes in the repository - with diff command. If any changes exists, docker will created a directory with generated SHA256 name, and write diff directory inside it. Once it's saved to the disk layer becomes read-only and temporary container will be stopped and scheduled for removal.

Layers are identified by SHA256 string, like 77ad1ae9f12f323484f0b967c9173d24411ccbc2ba861e56201154ed4518e4d7 and are stored locally with other docker resources at folder /var/lib/docker/overlay2/ (in Linux and MacOS). You actually don't want to debug image through filesystem as it's hard to trace all SHA256 and easy to get lost. We will review available investigation tools in the debugging chapter.

tree -F /var/lib/docker/overlay2/{SHA256}/diff
# this layer added an app folder with one file in it
# /var/lib/docker/overlay2/{SHA256}/diff
# └── app/
#     └── recipe.json

On the second instruction docker will create another temporary container, but this time the filesystem will be taken from just created layer. After the execution resulting diff will be saved to a new SHA256-named directory. The new directory will include file lower which denotes it's parent layers. It also includes merged directory which contains the unified content of the two layers: the parent layer and itself. Finished container will be scheduled for deletion.

cat /var/lib/docker/overlay2/{SHA256}/lower
# this layer have two ancestors

And so on while we get to the last instruction of Dockerfile. If you want to observe the process of how docker is creating new layers, when you might use docker diff command. In the following example I use docker build with --rm=false option to save temporary containers while building. Currently (Docker version 20.10) Build Kit ignores --rm option. To enable it anyway I will disable BUILDKIT engine with environment variable DOCKER_BUILDKIT:

cat ./Dockerfile
# FROM busybox
# RUN mkdir /data
# RUN dd if=/dev/zero bs=1024 count=1024 of=/data/one
# RUN chmod -R 0777 /data
# RUN dd if=/dev/zero bs=1024 count=1024 of=/data/two
# RUN chmod -R 0777 /data
# RUN rm /data/one
# CMD ls -lah /data

DOCKER_BUILDKIT=0 docker build --rm=false .
# ...
# # Step 2/8 : RUN mkdir /data
#  ---> Running in cb00a7f3dbc1
#  ---> ab87ab044d69
# Step 3/8 : RUN dd if=/dev/zero bs=1024 count=1024 of=/data/one
#  ---> Running in 0077dfb8b4e0
# 1048576 bytes (1.0MB) copied, 0.005491 seconds, 182.1MB/s
#  ---> ad6ad189b59f
# Step 4/8 : RUN chmod -R 0777 /data
#  ---> Running in 8767c6569d9e
#  ---> c125f446abf2
# Step 5/8 : RUN dd if=/dev/zero bs=1024 count=1024 of=/data/two
#  ---> Running in f1fd6a1bfab6
# 1048576 bytes (1.0MB) copied, 0.004977 seconds, 200.9MB/s
#  ---> df49c14999cf
# Step 6/8 : RUN chmod -R 0777 /data
#  ---> Running in 1d27ffe5388d
#  ---> 571afaa3ec54
# Step 7/8 : RUN rm /data/one
#  ---> Running in 34a9e0c3bf51
#  ---> 0c619edc5d7a
# Step 8/8 : CMD ls -lah /data
#  ---> Running in a7a1e5da0bd0
#  ---> 5bc1de223247
# Successfully built 5bc1de223247

docker diff cb00a7f3dbc1
# A /data
docker diff 0077dfb8b4e0
# C /data
# A /data/one
docker diff 8767c6569d9e
# C /data
# C /data/one
docker diff f1fd6a1bfab6
# C /data
# A /data/two
docker diff 1d27ffe5388d
# C /data
# C /data/two
docker diff 34a9e0c3bf51
# C /data
# D /data/one
docker diff a7a1e5da0bd0

Final file structure after merging all layers is a docker image. Image can used to run containers locally with docker run <image> command as a container or distributed though private and public registries.

Docker Layer Cache (DLC)

Docker creates container images using layers. Each command that is found in a Dockerfile creates a new layer. Each layer contains the filesystem changes to the image for the state before the execution of the command and the state after the execution of the command.

Docker uses a layer cache (DLC) to optimize and speed up the process of building Docker images. Every layers is stored locally and have additional files to store the diff of files on the instruction. If the diff of the files stay the same then the layer will be used as is from our local system. In the docker build command output cached layer will be represented similar to this:

# First run, no cache

docker build -f ./Dockerfile.dd . 
# [+] Building 14.7s (12/12) FINISHED
# ...
# => [2/7] RUN mkdir /data                                         0.5s
# => [3/7] RUN dd if=/dev/zero bs=1024 count=1024 of=/data/one     0.6s
# => [4/7] RUN chmod -R 0777 /data                                 0.7s
# => [5/7] RUN dd if=/dev/zero bs=1024 count=1024 of=/data/two     0.6s
# => [6/7] RUN chmod -R 0777 /data                                 0.6s
# => [7/7] RUN rm /data/one                                        0.6s
# => exporting to image                                            0.3s
# => => exporting layers                                           0.3s
# => => writing image sha256:5da363609d5866eed38bf75674235b0e057a70f535f9acb6b588657d74617893

# Second run, cached layers

docker build -f ./Dockerfile.dd .
# [+] Building 1.7s (11/11) FINISHED
# ...
# => CACHED [2/7] RUN mkdir /data                                      0.0s
# => CACHED [3/7] RUN dd if=/dev/zero bs=1024 count=1024 of=/data/one  0.0s
# => CACHED [4/7] RUN chmod -R 0777 /data                              0.0s
# => CACHED [5/7] RUN dd if=/dev/zero bs=1024 count=1024 of=/data/two  0.0s
# => CACHED [6/7] RUN chmod -R 0777 /data                              0.0s
# => CACHED [7/7] RUN rm /data/one                                     0.0s
# => exporting to image                                                0.0s
# => => exporting layers                                               0.0s
# => => writing image sha256:5da363609d5866eed38bf75674235b0e057a70f535f9acb6b588657d74617893

First run takes 14.7 seconds to build, second run uses cache and takes only 1.7 seconds to build. If you will change Dockerfile on the step 5/7 all layers prior to 5/7 will be taken from cache and starting the step 5/7 docker will rebuild all the following layers.

Overview of Dockerfile

In this section we will briefly overview Dockerfile instructions. Docker documentation should be your handbook on creation a docker image. I just want to focus on caveats and tips. Dockerfile In the previous example had three instructions FROM, RUN and CMD, let's begin from them:

FROM busybox

RUN mkdir /data
RUN dd if=/dev/zero bs=1024 count=1024 of=/data/one
RUN chmod -R 0777 /data
RUN dd if=/dev/zero bs=1024 count=1024 of=/data/two
RUN chmod -R 0777 /data
RUN rm /data/one

CMD ls -lah /data


The FROM <image> instruction sets the Base Image. A valid Dockerfile must start with FROM instruction. Base image might be

  • a dockerized operating system such ubuntu, centos or alpine
  • pre-installed software image maintained by other people/companies, such as node, php, python
  • scratch - empty filesystem without OS installed in case you want to build something from, say, scratch

Our example image busybox is maintained by Docker Community. This image has a small size (~5Mb) and includes many common UNIX utilities:

The instruction has more arguments you optionally could use: FROM --platform=linux/amd64 ubuntu:22.04 AS base FROM --platform=<platform> <image><:tag or @digest> AS <name>

  • --platform=<platform> flag can be used to specify the platform in case <image> references a multi-platform image. Current version of Docker software supports this platforms. By default image builds in linux/amd64 architecture. Officially Docker images supports (in 2022) this architectures:
    • linux/amd64 (alias amd64) - 64-bit x86
    • linux/arm32v5 (alias arm32v5)
    • linux/arm32v6 (alias arm32v6)
    • linux/arm32v7 (alias arm32v7)
    • linux/arm64v8 (alias arm64v8) - 64-bit ARM
    • linux/i386 (alias i386) - 32-bit x86
    • linux/mips64le (alias mips64le) - MIPS 64-bit, little-endian
    • linux/ppc64le (alias ppc64le) - PowerPC 64-bit, little-endian
    • linux/riscv64 (alias riscv64) - RISC-V 64-bit, little-endian
    • linux/s390x (alias s390x) - IBM System z 64-bit, big-endian
    • windows/amd64 (alias windows-amd64)
  • :tag or @digest is a pointer for a specific version of the image. By default it references to tag :latest. Try to avoid latest and use exact tag in your images and containers.
  • name is a temporary local name of the image to reuse in Multi-stage builds.


The RUN instruction will execute any commands in a shell on a new layer and commit the diff results. The resulting layer will be used in the next instruction of the Dockerfile. The default shell is /bin/sh -c for Linux platforms and cmd for Windows, you could change it with docker SHELL command.

RUN <command> # Example: RUN /bin/bash -c echo hello RUN ["executable", "param1", "param2"] # Example: RUN ["/bin/bash", "-c", "echo hello"]

A good practice is to combine a sequence of RUN instructions under one, using And-If (&&) shell operator. It will reduce number of layers in your docker image.

RUN apt-get update && \
    apt-get install -y bzr cvs git mercurial subversion && \
    rm -rf /var/lib/apt/lists/*


The command to change default shell to execute RUN in. It's a rarely used instruction and is recommended to explicitly use shell of your choice inside RUN instruction if standard sh -c shell is not working for your case.

SHELL ["/bin/bash", "-c"]
RUN echo "I am now using bash!"

# Can be replaced by just
RUN /bin/bash -c echo "I am using bash!"


COPY [--chown=<user>:<group>] <src1> <src2> ... <dest>
# Copy two files from current local directory inside container's /app directory
COPY ./package.json ./yarn.lock /app/

# or
COPY [--chown=<user>:<group>] ["<src1>", "<src2>", ..., "<dest>"]
# Same as previous example
COPY ["./package.json", "./yarn.lock", "/app/"]

The COPY command copies files from one or more <src> of you local filesystem and put them to <dest> inside the container. You are allowed to use pattern syntax to match files:

  • * - any sequence except path separator, for example "/home/catch/*" to match "/home/catch/foo", but not "/home/catch/foo/bar"
  • ? - any single symbol, for example "/home/?opher" to match "/home/gopher"
  • [a-z] - single symbol within range, examples:
    • [ab] - the character a or b
    • [^ab] - any character except a or b
    • [A-Za-z0-9] - any character from A to Z or a to z or 0 to 9


The ADD command is very similar to COPY with few important exceptions:

  • src can be an URL. In this case ADD command will download file from the URL and place it inside the container. Authorization, headers, or saving with a different name is not supported. I recommend using RUN curl or RUN wget for flexibility and security reasons.
  • if src is a local .tar archive, then ADD command will decompress it to dest. Tar archive might have gzip, bzip2 and xz compression methods
  • if the destination doesn't exist ADD will create all missing folders in it's path
ADD [--chown=<user>:<group>] <src1> <src2> ... <dest>
# Download the .eslintrc.js file from URL to the container's /app directory
# and copy two files from local filesystem in the same folder
ADD ./package.json ./yarn.lock /app/

# Same as previous example
ADD ["",\

\ is the escape symbol to prevent docker from reading next line as a separate command and ignore next symbol which in the example is Newline Character \n . Dockerfile parser then is joining lines together while reading the file. It's a nice trick to keep your Dockerfile readable.

It seems to be an inspiring idea to use ADD over COPY. But it is less secure and less error-prone, so a good practice is to avoid that command. By the docker users community it's preferred to use RUN wget for downloading files and RUN tar -x to decompress it. Using COPY and RUN instead of ADD you can prevent multiple problems:

  • Permissions. ADD will download the file with chmod 600 mode
  • Man-in-the-middle attacks. ADD do not support authorization and checksum to ensure downloaded file is exact you think.
  • Unpacking tar archive will resolve conflicts in union of files instead replacing the path. No options are supported to change that.
  • Back in 2014 it was decided that ADD instruction will not be extended any further to support new features or fix existing feature concerns. There are numerous complains about ADD working unstable with XZ archives.


Every image has an outcome command which is run in the container when you run it. Dockerfile should specify at least one of CMD or ENTRYPOINT commands. Both ENTRYPOINT and CMD instructions are combined by docker to run in container in one command. In this example the resultind command to run in the container during startup would be /bin/bash -c node index.js:

ENTRYPOINT ["/bin/bash", "-c"]
CMD ["node", "index.js"]

In other words CMD is providing the default options which ENTRYPOINT will execute with. When you run the container you can replace CMD with custom options docker run <image> node app.js. In this case container will override CMD instruction with whatever you use as last arguments. Resulting startup command then looks /bin/bash -c node app.js

If there are multiple ENTRYPOINT or CMD instructions set only the last one respectively. The last ENTRYPOINT will have options from the last CMD:

# will be overriden by next command
ENTRYPOINT ["/bin/sh", "-c"]
CMD echo "Will be overridden."
# will be an ENTRYPOINT and CMD to use
ENTRYPOINT ["/bin/bash", "-c"]
CMD echo "Hello, world"

To replace image's ENTRYPOINT you might use docker run --entrypoint <new command> argument.

Unlike other commands array-syntax and plain string syntax behave differently for ENTRYPOINT instruction. If the ENTRYPOINT is a plain string it will skip any CMD instructions. Even if they are set in a base image, it appears empty. I modified the table from documentation with real example to make it a cheatsheet.

No ENTRYPOINTENTRYPOINT exec_entry p1_entryENTRYPOINT [“exec_entry”, “p1_entry”]
No CMDerror, not allowed/bin/sh -c exec_entry p1_entryexec_entry p1_entry
CMD [“exec_cmd”, “p1_cmd”]exec_cmd p1_cmd/bin/sh -c exec_entry p1_entryexec_entry p1_entry exec_cmd p1_cmd
CMD [“p1_cmd”, “p2_cmd”]p1_cmd p2_cmd/bin/sh -c exec_entry p1_entryexec_entry p1_entry p1_cmd p2_cmd
CMD exec_cmd p1_cmd/bin/sh -c exec_cmd p1_cmd/bin/sh -c exec_entry p1_entryexec_entry p1_entry /bin/sh -c exec_cmd p1_cmd
ENTRYPOINT exec node index.js
CMD -- --level=2

A good practice is to run exec command for running a service or another infinite command inside the container. It will allow the running process to receive SIGTERM on docker container stop command.

If you run checks, complicated command or run a service on container start up, consider using shell script in the ENTRYPOINT instruction. Community standard to name it /

COPY ./ /


By default docker run as root user inside the container. Don't use root and don't install sudo in your image.

# after packages installation as root, switch to non-root user
RUN addgroup app && adduser -DH -G app app
USER app


WORKDIR - current directory for the shell. A good practice is to use absolute paths as a pointer. Prefer using WORKDIR /path instead of RUN cd /path && ... for readability.


VOLUME instruction bind host directory into the container during the docker run command. Impossible to define VOLUME as a host directory from the image. Once VOLUME appears in the Dockerfile further file changes inside set directory will be discarded. It's strongly recommended to use VOLUME for every part of your application that changes files while running in the container:

  • databases data
  • configuration files
  • secrets and credentials
RUN mkdir /app && touch /app/app.config
# Will not change /app/config file
RUN echo "Discarded file change" > /app/config


Ports 1-1024 restricted to use by root-only user in many environments. docker run -p "80:8080" allow you to bind container port to whatever port you want so I don't see the point of exposing exact root-privileged port inside the image.

Docker build context and ENV



Sorting by relevance LABEL instruction should be rated higher. Set at least a maintainer label to highlight your contact details for an image user.

LABEL com.example.version="0.0.1-beta"
LABEL maintainer="Eugene Gavrilov <me@egavrilov,com>"
LABEL vendor2=ZENITH\ Incorporated
LABEL com.example.release-date="2015-02-12"

Base Images

Base Images is the layer from which you are starting build your own docker image.

Empty base image is scratch. It doesn't have any tools or OS installed. To use it you should download and unpack filesystem of an OS installed and optimized in virtual machine beforehand.

If you don't need running OS in your container, there is a distro-less class of images like busybox or buildpack-deps. Images of that type has pre-installed set of tools to run some finite process. Some of them are tweaked to the minimal possible size to not even have sh included in them. Once the process finishes the container will be stopped.

For ready to use OS there are almost every Linux distro dockerized to base image. To name some popular choices alpine, ubuntu, centos are at your service. A good security practice here is to use official docker images maintained and checked by docker community.

[pic of docker official image badge]

Next level of base images are software specific base images. Same security concerns apply for this base images: you should absolutely trust maintainer and their exploit awareness. Officially supported images are presented for 100+ languages and platform: node, php, python, ruby and others.

It is critical to chose right base image. Choosing base image is always a trade off and you should spend a good time on research of all possible variants and investigate what image and version of it you want to obey. Getting an enterprise application running on the image created by noname developer at their spare-time could lead to catastrophic data breach. Think twice even on using the images those github version looks acceplable. There is no guarantee that image is build exactly from git sources. Welcome to the wild west.

Security concerns and how to reverse engineer docker image

"Lily and James put their faith in the wrong person. Rather like you, Severus" -- Albus Percival Wulfric Brian Dumbledore

Investigating base images is a trivial time-consuming task. Docker images aren't a black box. It is possible to recreate base image's Dockerfile with a good precision. Except of one important thing: FROM instruction of base image. This is your security nightmare - you don't know exactly what is your chosen base image build on top of. But even the starting layer of base image with file structure will help us to investigate the source.

This blog is running on Next.js framework and the next docker image is based on node:alpine, which should be based on alpine. next image I trust the most because it's a specific version I created by myself. It's base image node is an official image. Let's check how well it's maintained, what tools does it include and how fast it has exploit fixes delivered to existing versions (tags).

As I use version 16 of node I want to now how fast docker image catch up with the minor releases. NodeJS has a separate changelog for LTS versions. Let's check CHANGELOG_V16:

curl -s | grep "Version 16."
## 2022-06-01, Version 16.15.1 'Gallium' (LTS), @BethGriggs prepared by @juanarbol
## 2022-04-26, Version 16.15.0 'Gallium' (LTS), @danielleadams
## 2022-03-17, Version 16.14.2 'Gallium' (LTS), @richardlau
## 2022-03-15, Version 16.14.1 'Gallium' (LTS), @danielleadams
## 2022-02-08, Version 16.14.0 'Gallium' (LTS), @danielleadams
## 2022-01-10, Version 16.13.2 'Gallium' (LTS), @danielleadams
## 2021-12-01, Version 16.13.1 'Gallium' (LTS), @BethGriggs
## 2021-10-26, Version 16.13.0 'Gallium' (LTS), @richardlau
## 2021-10-20, Version 16.12.0 (Current), @richardlau
## 2021-10-12, Version 16.11.1 (Current), @danielleadams
## 2021-10-08, Version 16.11.0 (Current), @danielleadams
## 2021-09-22, Version 16.10.0 (Current), @BethGriggs
## 2021-09-10, Version 16.9.1 (Current), @richardlau
## 2021-09-07, Version 16.9.0 (Current), @targos
## 2021-08-25, Version 16.8.0 (Current), @targos
## 2021-08-17, Version 16.7.0 (Current), @danielleadams
## 2021-08-11, Version 16.6.2 (Current), @BethGriggs
## 2021-08-03, Version 16.6.1 (Current), @targos
## 2021-07-29, Version 16.6.0 (Current), @BethGriggs
## 2021-07-14, Version 16.5.0 (Current), @targos
## 2021-07-05, Version 16.4.2 (Current), @BethGriggs
## 2021-07-01, Version 16.4.1 (Current), @BethGriggs
## 2021-06-23, Version 16.4.0 (Current), @danielleadams
## 2021-06-02, Version 16.3.0 (Current), @danielleadams
## 2021-05-19, Version 16.2.0 (Current), @targos
## 2021-05-04, Version 16.1.0 (Current), @targos
## 2021-04-20, Version 16.0.0 (Current), @BethGriggs

And compare versions release dates to official image releases.

hub-tool is an experimental official docker registry utility, if you use Docker Desktop on Windows or Mac OS - you already have it installed. This tool works only for Docker Hub, but not for any other private or public registies.

hub-tool tag ls node --all --format json | jq '.[] | select(.Name|test("16.[0-9]+.[0-9]+-alpine$", "ix")) | .LastUpdated + " - " + .Name'
# "2022-06-07T01:45:14.919724Z - node:16.15.1-alpine"
# "2022-04-28T00:41:34.401358Z - node:16.15.0-alpine"
# "2022-04-05T19:26:18.480346Z - node:16.14.2-alpine"
# "2022-03-18T10:41:55.024898Z - node:16.14.1-alpine"
# "2022-03-17T14:23:44.855778Z - node:16.14.0-alpine"
# "2022-01-12T01:43:29.047973Z - node:16.13.2-alpine"
# "2022-01-03T22:06:58.493331Z - node:16.13.1-alpine"
# "2021-11-13T11:56:59.44397Z - node:16.13.0-alpine"
# "2021-10-22T22:31:37.866828Z - node:16.12.0-alpine"
# "2021-10-13T08:26:29.789306Z - node:16.11.1-alpine"
# "2021-10-12T00:42:53.216629Z - node:16.11.0-alpine"
# "2021-09-23T22:09:25.165946Z - node:16.10.0-alpine"
# "2021-09-13T22:31:45.627604Z - node:16.9.1-alpine"
# "2021-09-10T00:31:24.178363Z - node:16.9.0-alpine"
# "2021-09-01T09:43:14.389341Z - node:16.8.0-alpine"
# "2021-08-19T08:14:21.576138Z - node:16.7.0-alpine"
# "2021-08-18T13:59:43.866091Z - node:16.6.2-alpine"
# "2021-08-03T23:12:18.092726Z - node:16.6.1-alpine"
# "2021-07-31T02:39:46.41951Z - node:16.6.0-alpine"
# "2021-07-15T00:03:48.689319Z - node:16.5.0-alpine"
# "2021-07-07T23:34:04.912768Z - node:16.4.2-alpine"
# "2021-07-07T05:59:58.90008Z - node:16.4.1-alpine"
# "2021-07-06T22:35:53.539948Z - node:16.4.0-alpine"
# "2021-06-27T04:39:51.34728Z - node:16.3.0-alpine"
# "2021-05-28T05:08:35.921587Z - node:16.2.0-alpine"
# "2021-05-06T22:07:11.813921Z - node:16.1.0-alpine"
# "2021-04-26T21:17:29.979691Z - node:16.0.0-alpine"

Node image is not looking that great, when it cames to dates comparizon. We are taking it as an official image, but patches (including security patches) are released as docker images with a delay from 2 to (!) 25 days. There is no guarantee the security patch will be delivered on time. Even on the official images.

But for releases of version 18 of Node we see much better situation with a delay of 2 to 5 days. It looks acceptable. I'm considering to move from version 16 to 18 just because of that fact.

curl -s | grep "Version 18."
## 2022-06-01, Version 18.3.0 (Current), @bengl
## 2022-05-17, Version 18.2.0 (Current), @BethGriggs prepared by @RafaelGSS
## 2022-05-03, Version 18.1.0 (Current), @targos
## 2022-04-19, Version 18.0.0 (Current), @BethGriggs

hub-tool tag ls node --all --format json | jq '.[] | select(.Name|test("18.[0-9]+.[0-9]+-alpine$", "ix")) | .LastUpdated + " - " + .Name'
# "2022-06-07T01:22:16.665047Z - node:18.3.0-alpine"
# "2022-05-19T02:49:36.744207Z - node:18.2.0-alpine"
# "2022-05-04T01:22:24.533585Z - node:18.1.0-alpine"
# "2022-04-21T13:20:21.708946Z - node:18.0.0-alpine"

Don't forget to follow the news on the software you are using, I track vulnerabilities through CVEdetails product newsfeeds. And please keep in mind to rebuild you images on the regular basis. Duin tool exist to help you keep notified on the base image updates. It offers 13 ways of notifications including email, Slack and Telegram. But as for me I think it is just not enough to keep docker images up-to-date.

The more complex stack you use in dockerized application, the more time you will need to investigate image before using. A good practice is to use minimal official images and securely install all the packages you need, but not a single more.

When you review the base image to use, you should revisit it's Dockerfile and check every layer for malicious data. docker history decompile the image to Dockerfile view. The output of docker history node:16-alpine command looks a bit weird:

docker history --no-trunc node:16-alpine
# sha256:153b55b2e04843874c58ff5a14dda32c03bca15007cf2cd19143bb6ecd3beb7e
#              25 hours ago   /bin/sh -c #(nop)  CMD ["node"]                        0B
# <missing>    25 hours ago   /bin/sh -c #(nop)  ENTRYPOINT [""] 0B
# <missing>    25 hours ago   /bin/sh -c #(nop) COPY file:4d192565a7220e135cab6c77fbc1c73211b69f3d9fb37e62857b2c6eb9363d51
#                                    in /usr/local/bin/                                     388B
# <missing>    25 hours ago   /bin/sh -c apk add --no-cache --virtual .build-deps-yarn curl gnupg tar \
#       ... installing yarn
#       && curl -fsSLO --compressed "$YARN_VERSION/yarn-v$YARN_VERSION.tar.gz.asc" \
#       && gpg --batch --verify yarn-v$YARN_VERSION.tar.gz.asc yarn-v$YARN_VERSION.tar.gz \
#       && mkdir -p /opt   && tar -xzf yarn-v$YARN_VERSION.tar.gz -C /opt/ \
#       && ln -s /opt/yarn-v$YARN_VERSION/bin/yarn /usr/local/bin/yarn \
#       && ln -s /opt/yarn-v$YARN_VERSION/bin/yarnpkg /usr/local/bin/yarnpkg \
#       && rm yarn-v$YARN_VERSION.tar.gz.asc yarn-v$YARN_VERSION.tar.gz \
#       && apk del .build-deps-yarn \
#       && yarn --version                                                                    7.76MB
# <missing>    25 hours ago   /bin/sh -c #(nop)  ENV YARN_VERSION=1.22.19                    0B
# <missing>    25 hours ago   /bin/sh -c addgroup -g 1000 node \
#       && adduser -u 1000 -G node -s /bin/sh -D node \
#       && apk add --no-cache libstdc++  \
#       && apk add --no-cache --virtual .build-deps curl \
#       && ARCH= && alpineArch="$(apk --print-arch)" \
#       && case "${alpineArch##*-}" \
#             ... architecture resolutions and checksums
#       && curl -fsSLO --compressed "$NODE_VERSION/node-v$NODE_VERSION.tar.xz" \
#       && curl -fsSLO --compressed "$NODE_VERSION/SHASUMS256.txt.asc" \
#       && gpg --batch --decrypt --output SHASUMS256.txt SHASUMS256.txt.asc \
#       ... compiling node
#       && rm -Rf "node-v$NODE_VERSION" \
#       && rm "node-v$NODE_VERSION.tar.xz" SHASUMS256.txt.asc SHASUMS256.txt; \
#       && rm -f "node-v$NODE_VERSION-linux-$ARCH-musl.tar.xz" \
#       && apk del .build-deps \
#       && node --version \
#       && npm --version                                                                        98.5MB
# <missing>     25 hours ago   /bin/sh -c #(nop)  ENV NODE_VERSION=16.15.1                      0B
# <missing>     2 months ago   /bin/sh -c #(nop)  CMD ["/bin/sh"]                               0B
# <missing>     2 months ago   /bin/sh -c #(nop) ADD file:5d673d25da3a14ce1f6cf66e4c7fd4f4b85a3759a9d93efb3fd9ff852b5b56e4 in /   5.57MB

To reduce your gold-costing time I recommend using dive. Dive is a tool to view changes to filesystem made on every layer of an image. In our example we have a monstrous command that installs build packages, yarn and node at once. Right half of the screen allow us to see actual change and decide on the security concerns with all the information at hand.

Another concern is the last line ADD file:5d673d25da3a14ce1f6cf66e4c7fd4f4b85a3759a9d93efb3fd9ff852b5b56e4 which turns out to be a script to run in the container. I would repeat myself to not trust git repository marked as source for the image. Docker image might be build on different sources.

Checking to be the same version as in the repository

diff -s \
  <(docker run --rm node:16-alpine cat /usr/local/bin/\
  <(curl -s
# Files /proc/self/fd/11 and /proc/self/fd/18 are identical

Now I'm sure official image to be legit.

Silver bullet of base image security

There are a class of docker images called distroless. They do not provide Linux distributive and shell to debug the container or image. Functionality of this images is sharpened to run only platform specific applications. GoogleContainerTools have images for node, java and python3. They don't appear to be the smallest possible, Alpine distros in most versions are a bit smaller (for nodejs it is 68MB for alpine vs 75MB for distroless). But distroless images are the most secure:

  • Even if the attacker gets access inside docker container to use exec they would not be able to actually execute anything. Distroless image simply have no shell. It has a back-hit for the user - not having usual tools inside your running application make it hard to debug.
  • Installing packages to broken-through container is also not possible - there is no package manager
  • Having a root privilege inside distroless container do not have usual caveats.
  • Secrets are almost safe with distroless, they could not be discovered, printed or sent by network without shell access
  • You don't have to worry about discovered Linux distro vulnerabilities and could focus on following news of your node, python or java version.

Security concerns #2, checklist for Dockerfile

Base image is the most important part of the Dockerfile, but you don't control it. Investigation just helps you to decide trust the maintainer or not. Once base image is stamped legit our own image still need to meet the high safety standard. Best practices of securing docker image include, but not limited to:

  • Use trusted images with just as much packages as you need and not a single more
  • Tag concrete version of the base image, latest tag smell
  • Run as a non-root user
  • Follow up on the CVE topic for the software you use
  • Double check you don't have credentials leaked inside your image
  • When downloading files from the internet into the image - https URL and checksum of the file must be provided

In the previous section we define a trustworthy base image. Please check once again it doesn't include something you don't need. Your trustworthy base image is just one tag of an image, don't marry the whole stack and use the one you checked, not the latest. Trickster with hacking background is hiding behind the corner to compromise the latest build and count sheep who followed.

Not discovered vulnerabilities are also here. Don't ease the hacker his way - execute the process in the container with minimal privileges you need. By default docker will run as root user inside the container, so why the USER instruction inside the image should appear right after packages installation. Don't mess up with permissions of your files using USER instruction after you created all the files and executables as root.

Use snyk to test the image for vulnerabilities...

There are a class of crawlers on public docker registries that download images and search credentials, tokens and certificates inside them layer by layer. Make sure sensitive data is excluded in .dockerignore file.

My favorite mistake I made to many times: when downloading the file from internet always use checksum hosted somewhere else or even built-in the image. Both URLs must be HTTPS. No exceptions.

Docker image size and build time optimizations

  • Minimal base image size
  • Review installed packages
  • add .dockerignore files
  • use BuildKit caching configuration

Use base images of minimal size

In most cases the app don't need full server OS installation to run itself. Generic advice is to use minimal base image of tiniest size allowing your app to work properly. Popular decision is to use Alpine images.

Currently the most used base images are:

Ubuntu 22.04 >> 29.01MB Alpine 3.16.0 >> 2.67MB Debian Bullseye >> 53.41MB Debian Bullseye Slim >> 30.89MB Centos 8.4.2105 >> 73.15MB

This numbers is taken from the hub and are not absolute, because it's archived image size. Once you pull the image to local registry it will show the real size:

docker images
# REPOSITORY                                     TAG             IMAGE ID       CREATED        SIZE
debian                                         bullseye-slim   abc589ebc613   8 days ago     80.4MB
debian                                         bullseye        d2780094a226   8 days ago     124MB
ubuntu                                         22.04           27941809078c   3 weeks ago    77.8MB
alpine                                         3.16.0          e66264b98777   5 weeks ago    5.53MB
centos                                         8.4.2105        5d0da3dc9764   9 months ago   231MB

The real size of this base images are: Ubuntu 22.04 >> 77.8MB Alpine 3.16.0 >> 5.53MB Debian Bullseye >> 124MB Debian Bullseye Slim >> 80.4MB Centos 8.4.2105 >> 231MB

As you see comparing the numbers from docker hub doesn't make sense for the basic usage scenario.

Reduce the size of installed packages

The largest part of the size optimization of docker image lies in filtering of installed packages for what packages you need to build your application versus which packages you need to actually run it.

Multi-staged builds

Docker build creating a layer on every instruction and do not excuse deletions at the end of the build. Previous layers will keep later-on deleted files and increase the size of the image, probably leaving information not wanted to expose in the hidden layers.

Starting Docker 17.06 released in 2017 multi-stage builds was introduced. The concept of multi-stage builds is to define a sequence of temporary builds inside one Dockerfile. Each build called stage and have it's temporary name. In those stages we prepare data to use in our image without storing created layers. Every stage has it's own base image. Files might be copied individually from previous stages to current.

To build a static website with GatsbyJS docker would need node, npm packages and build kit. But reverse-proxy is the only package needed to serve static files. Dockerfile I use for GatsbyJS is an example of multi-stage build:

FROM node:16-buster-slim as deps
COPY package.json yarn.lock .
RUN yarn --frozen-lockfile

FROM node:16-buster-slim as builder
COPY . .
COPY --from=deps /app/node_modules ./node_modules
RUN ["yarn", "build"]

FROM nginxinc/nginx-unprivileged:1.22-alpine as runtime
COPY --from=builder /app/public /usr/share/nginx/html

This build has three stages: deps, builder and our final image build with no name. deps stage take care of downloading npm packages. builder stage copy sources of GatsbyJS application from local filesystem and replace node_modules folder with downloaded and compiled npm packages. Then build static html website to /app/public folder. Final stage takes html files and nothing more: no nodejs and npm packages, just static files are saved to the final image. All temporary layers from previous stages erases. Final image file is just 25MB, which still have a room for improvement, but is acceptable comparing to ~350MB build without multi-stage.

docker images
# tmp/nginx-gatsby  latest      4e9e23cc20d0   8 seconds ago    25.4MB

Caching stages of multi stage build with "--cache-from"

There is a popular opinion that such approaches as combining multiple RUN instructions into one and other image size optimizations became obsolete inside the stages of multi stage build.

But for some scenarios it might be helpful to cache stages locally and reuse them if the filesystem remains the same until some stage. It might be a time saver for local builds and to use in CI flow. Just imagine your CI running in serverless environment and is charged for CPU time to create over and over the same unchanged layers to make specific version of package. It's just a tens of dollars thrown away because of no caching used. Surprisingly, using caching for multi stage build is similar to not using the multi stage build. To build stages separately we will use --target <stage> of docker build command.

Taking Dockerfile from the example above we need to run docker build 3 times:

docker build \
 --target deps \
 --cache-from <registry username>/<app>:dependencies \
 -t <registry username>/<app>:dependencies \

docker build \
 --target builder \
 --cache-from <registry username>/<app>:builder \
 -t <registry username>/<app>:builder \
docker build \
 --target runtime \
 -t <registry username>/<app>:latest \
 -t <registry username>/<app>:<version> \

There are two strategies of caching stages: locally and remotely. You will likely use local cache in standalone CI or on your machine and remote cache for serverless scenarios. For caching stages locally you are already set, no need for additional steps. Whenever your docker successfully build the stage it will replace the latest image of this stage in local registry. To cache remotely we need to pull and push stages images to remote registry before and after build respectively.

docker pull <registry username>/<app>:dependencies || true
docker build --target deps --cache-from <registry username>/<app>:dependencies -t <registry username>/<app>:dependencies .

docker pull <registry username>/<app>:builder || true
docker build --target builder --cache-from <registry username>/<app>:builder -t <registry username>/<app>:builder .

docker build --target runtime -t <registry username>/<app>:latest -t <registry username>/<app>:<version> .

# Push stage cache only on successful build, placing all pushes to the end
docker push <registry username>/<app>:dependencies
docker push <registry username>/<app>:builder

# Publish the app image
docker push <registry username>/<app>:latest
docker push <registry username>/<app>:<version>

When possible, I always merge together commands that create files with commands that delete those same files into a single RUN line. This is because each RUN line adds a layer to the image, the output is quite literally the filesystem changes that you could view with docker diff on the temporary container it creates. If you delete a file that was created in a different layer, all the union filesystem does is register the filesystem change in a new layer, the file still exists in the previous layer and is shipped over the networked and stored on disk. So if you download source code, extract it, compile it into a binary, and then delete the tgz and source files at the end, you really want this all done in a single layer to reduce image size. Still your final stage image will be resulting in smaller size for deployment.

Simplified, but not final bash script to use caches of stages in multi stage build of 3 stages: dependencies, build and runner. Please adapt the script on your needs.

export APPNAME="<name>"
export STAGES=(


for stage in ${STAGES[@]}; do
    # Use cache from remote repository, tag as latest, keep cache metadata
    docker build -t $APPNAME:$stage-latest \
          --cache-from $APPNAME:$stage-latest \
          --build-arg BUILDKIT_INLINE_CACHE=1 ./Dockerfile.$stage

    # Push new build up to remote repository replacing latest
    docker push $APPNAME:latest

Docker image name, tag and digest

When docker builds an image with the command docker build ./ it creates an image without a name

docker build ./
# ...
# [+] Building 224.7s (21/21) FINISHED

docker images
# REPOSITORY  TAG         IMAGE ID       CREATED          SIZE
# <none>      <none>      1ff2b48990d0   10 seconds ago   12.1MB

Managing images operating hashes like 1ff2b48990d0 is a difficult task. You still can guess what image it is by running docker history 1ff2b48990d0 to see Dockerfile representation of built image, but it is still time consuming and inconvenient. Previously in the examples we used -t option to set a name for new image.

docker build --help
# Usage:  docker build [OPTIONS] PATH | URL | -
# Build an image from a Dockerfile
# Options:
# ...
#  -t, --tag list                Name and optionally a tag in the 'name:tag' format

By using --tag/-t option docker image gets it's local names and tags to reuse in docker run <name>:<tag> command.

docker build -t rust-test-image -t rust-test-image:test -t rust-image:stage ./
# ...
# [+] Building 231.2s (21/21) FINISHED

docker images
# REPOSITORY        TAG         IMAGE ID       CREATED          SIZE
# rust-image        stage       1ff2b48990d0   59 minutes ago   12.1MB
# rust-test-image   latest      1ff2b48990d0   59 minutes ago   12.1MB
# rust-test-image   test        1ff2b48990d0   59 minutes ago   12.1MB

To set a tag after image was built use docker tag <tag> command:

docker tag <image name or digest> rust-test-image:latest
docker tag <image name or digest> rust-test-image:test
docker tag <image name or digest> rust-test-image:stage

Testing docker images

It is not guaranteed whether your app will start correctly in the container from the docker image you built. Most of the images presented on the Docker Hub do not have tests at all. It's clearly the reason how you could ruin the production deployment without the image testing. The areas to cover:

  • Container interaction
  • Network. Is socket available from outside container?
  • File system - file owners and permissions
  • Check core software functionality

Covering these topics it will solve possible faults:

  • Breaking changes from newer Docker version.
  • Breaking changes from a new version of the software itself.
  • Modifying Dockerfile and breaking key functionality

The question to answer: if you use the 3rd party image in your Continuous Integration (CI), how do you know the app will run at all then the container starts? And how do you know the security guidelines of your company are not broken by the image not developed inside the company?

We will review three approaches of testing the image:

  • Static tests to run the container using Google Container Structure Test.
  • Writing bash script tests
  • Briefly overview Dynamic tests using Chef Inspect

Writing tests

The best approach you could trust the most is to write tests on your own as a bash script inside the image itself. You could run the test script by replacing entrypoint of the container with docker run --entrypoint "/bin/sh -c /app/" <image> command. If it's a 3rd party image you to test, you can mount your test file inside the container docker run -v ./tests/ --entrypoint "/bin/sh -c /app/" <image>

But writing tests is time consuming and add maintaining complexity to the developers. To save the budget and time on typical docker images testing you could use opensource tools for static and dynamic testing.

Static tests

If we now want to test a freshly created docker image for certain configuration values or the existence of files, there is a good tool for it: container-structure-test form Google Container Tools. The tool is able to check images with the option to not having them started at all. Similar to unit tests, errors can be quickly identified before time-consuming tests (inside the container) are carried out.

Various static tests to run on the image may include:

  • file existence
  • configuration errors
  • command output
  • comply with security guidelines of your company

Structure test tool has great documentation and I recommend to rely on the manuals from their github page over than the observation examples in this article.

Example run of the test framework after it's installation:

container-structure-test test \
    --image \
    --driver docker \
    --config config.yaml

There is the local image you want to test. And config.yaml is a test configuration file. Configuration file include descriptions of:

  • Command Tests
  # The name of the test
  - name: "gunicorn flask"
      # A list of commands (each with optional flags) to run before the actual command under test.
      # To avoid unexpected behavior and output when running commands in the containers, **all entrypoints are overwritten by default.** If your entrypoint is necessary for the structure of your container, use the `setup` field to call any scripts or commands manually before running the tests.
    setup: [["virtualenv", "/env"], ["pip", "install", "gunicorn", "flask"]]
    # The command to run in the test.
    command: "which"
    # The arguments to pass to the command.
    args: ["gunicorn"]
    # List of regexes that should match the stdout from running the command.
    expectedOutput: ["/env/bin/gunicorn"]
  - name:  "apt-get upgrade"
    command: "apt-get"
    args: ["-qqs", "upgrade"]
    # List of regexes that should **not** match the stdout from running the command.
    excludedOutput: [".*Inst.*Security.* | .*Security.*Inst.*"]
    # List of regexes that should **not** match the stderr from running the command.
    excludedError: [".*Inst.*Security.* | .*Security.*Inst.*"]
  • File Existence Tests
- name: 'Root'
  # Path to the file or directory under test
  path: '/'
  # Whether or not the specified file or directory should exist in the file system
  shouldExist: true
  # The expected Unix permission string (e.g. drwxrwxrwx) of the files or directory.
  permissions: '-rw-r--r--'
  # The expected Unix user ID of the owner of the file or directory.
  uid: 1000
  # The expected Unix group ID of the owner of the file or directory.
  gid: 1000
  # Checks if file is executable by a given user. One of `owner`, `group`, `other` or `any`
  isExecutableBy: 'group'
  • File Content Tests
- name: 'Debian Sources'
  path: '/etc/apt/sources.list'
  # List of regexes that should match the contents of the file
  expectedContents: ['.*httpredir\.debian\.org.*']
  # List of regexes that should **not** match the contents of the file
  excludedContents: ['.*gce_debian_mirror.*']
  • Metadata Test
  # A list of environment variable key/value pairs that should be set in the container. isRegex (_optional_) interpretes the value as regex.
    - key: foo
      value: baz
  # A list of image labels key/value pairs that should be set on the container. isRegex (_optional_) interpretes the value as regex.
    - key: 'com.example.vendor'
      value: 'ACME Incorporated'
    - key: 'build-date'
      value: '^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{6}$'
      isRegex: true
  # The ports exposed in the container.
  exposedPorts: ["8080", "2345"]
  # The volumes exposed in the container.
  volumes: ["/test"]
  # The entrypoint of the container.
  entrypoint: []
  # The CMD specified in the container.
  cmd: ["/bin/bash"]
  # The default working directory of the container.
  workdir: "/app"
  # The default user of the container.
  user: "luke"
  • License Tests
  # If the image is based on Debian, check where Debian lists all licenses.
- debian: true
  # A list of other files to check.
  files: ["/foo/bar", "/baz/bat"]
  • Environment Variables
  - key: "VIRTUAL_ENV"
    value: "/env"
  - key: "PATH"
    value: "/env/bin:$PATH"

Running tests without Docker

By default container-structure-test will run intermediate containers in Docker. If you don't have docker in your CI you could test the image without running it by changing option --driver docker to --driver tar. In this case the image will be extracted as a filesystem and all tests will be run on the top of this filesystem. The limitation of this approach is that you could not test commands and it's output. All the configuration under commandTests will be ignored.

Dynamic tests

To test the behavior of running application we should go a step further and test our app while it's running. For testing running containers we could use the tool Inspec ( It was designed as a tool to describe and check compliance rules according to the "compliance as code" principle.

Compliance as code methods ensure that the correct regulatory or company compliance requirements are fulfilled with zero-touch on the path to production. It builds compliance into development and operations. The utilization of compliance as code tools enable stakeholders to ensure that production processes are compliant by means of defining how resources must be configured. Such a structure often allows these tools to automatically adjust resources into a compliant state in order to meet these pre-defined compliance requirements.

Instead of creating compliance rules documentation in text form requirements is cast into the code. I refer to the Inspec documentation to discover how to create test scenarios.


  1. Examining layers of your own build with docker diff
  2. Docker Container's Filesystem Demystified
  3. 10 Docker security best practices
  4. Reverse engineer docker images into dockerfiles with dedockify
  5. Docker-Image Testing