Get started with Velocity
Join the Waitlist
Join Our Discord
Blogs

Optimizing Dockerfiles for Dev and Deployment

Jeff Vincent
Jeff Vincent
  
January 9, 2024

Optimizing Dockerfiles for Dev and Deployment

Dockerfile(s) are the definitions for Docker images, which — in turn — are the definitions for Docker containers. In this post, we’ll look at exactly what an image is, how it is built, and how you can reduce its overall size for improved performance in production, as well as how you can reduce image build time to streamline development and debugging.

What is a container

Containers are repeatable and transferable portions of a Linux operating system (OS). They are similar to virtual machines in that the code they run is largely self-contained, and as such it runs independently from the underlying host machine. However, unlike a virtual machine, which constitutes a complete OS, containers are lighter-weight, because they utilize portions of the underlying host OS — such as the Linux kernel — and they only contain the topmost layer of the OS, the application layer.

What is a container image

Docker images are the definitions from which containers are created, or spun up. Images are defined in Dockerfiles, and once created they never change state. That is, they remain unchanged in the same state in which they were built even after a container based on a given image has been spun up.

How are images built

Images are built in layers. Each command defined in a Dockerfile constitutes another layer in the image. Each layer created in the process of building an image defined in a Dockerfile is cached, and the subsequent command is run “on top of” that cached layer. The final image generated when you run docker build is a single layer, but it is the product of incremental build steps — each of which results in an “intermediate layer” that is then the foundation for the following build step. This continues until the final image is built, which can then be pushed to a registry to later be pulled — either locally for development, or by a container orchestration platform such as Kubernetes for deployment.

For example, all Dockerfiles begin with the FROM command, which is used to define the “base image,” or the starting point for the incremental layering process described above. Base images can be either completely “stock” Linux operating systems, or they can include additional layers on top of the OS itself.

FROM ubuntu:latest
FROM python3.10:latest

Dockerfiles are the definitions for Docker images, which — in turn — are the definitions for Docker containers. In this post, we’ll look at exactly what an image is, how it is built, and how you can reduce its overall size for improved performance in production, as well as how you can reduce image build time to streamline development and debugging.

Why image size matters

Because images are designed to be moved from one developer’s machine to another, and eventually to be run on a production server, there is an inherent benefit to keeping the image size as small as possible. This makes it faster to download, or “pull” to a developer’s machine, and it reduces the compute resources required to run the container in a production environment, which ultimately improves performance and reduces the cost of hosting the software defined in the container.

How to reduce image size

Only add exactly what the container needs in order to run. For example, as shown below, there will often times be files in a git repository that are not required for the code itself to run. These may be configuration files, README.md files or any variety of files that are generated during development, but that ultimately don’t need to be included for the code itself to run.

jeff@Jeffs-Air go-gin-redis-mongodb % ls -a
. .DS_Store .github LICENSE images
.. .git .vscode README.md redis-gin-mongo
FROM ubuntu:latest
COPY . .

If the above Dockerfile was defined in the root of the above git repository, all of these unnecessary files would be included in the resulting image. And while this makes the Dockerfile very simple to define, it increases the size of the resulting image unnecessarily.

A better approach would be to COPY exactly what you need into the image, rather than copying everything in the current and subdirectories, like so:

FROM ubuntu:latest
WORKDIR /app
COPY redis-gin-mongo .

This way, we can avoid bloating the image with all the unnecessary files listed above.

Multi-stage builds for compiled languages

Similarly, for compiled languages such as Go, the binary that is produced after compiling often times doesn’t need the same requirements to run as it does to be compiled. For example, in the case of Go, the binary can be run in any Linux environment without Go installed. This allows us to further reduce the size of the container image with multi-stage builds, like the following:

FROM golfing:1.18 as builder

# first (build) stage
WORKDIR /app
COPY . .
RUN go mod download
RUN CGO_ENABLED=0 go build -v -o app .

# final (target) stage

FROM alpine:3.10
WORKDIR /root/
COPY --from=builder /app ./
CMD ["./app"]

In the above e Dockerfile, we are creating two separate images. The first has a base image of golang:1.18, which includes all the elements required to download the dependencies and build a Go binary. But then we define an entirely new image — one with a base image of alpine3.10 that contains only the minimum requirements for a Linux OS to run. Then we copy the compiled binary from the first “builder” image into the second image, which means that the final “target” image will only include the minimum requirements for a Linux OS to run along with our compiled binary — a single executable file — that can be run at container startup.

Leveraging cached layers to reduce build time

As described above, Docker images are built in layers, and each layer is cached in local memory. Because of this, we can dramatically reduce the build time of a given image when a code change is made by including build steps that are likely to include code changes in the final steps defined in a Dockerfile.

For example, the following Dockerfile begins with a base image of python:3.10, which will be downloaded the first time the image is built during local development. This layer will be cached locally, so it won’t need to be downloaded again for subsequent builds. Next, we copy the requirements.txt file that defines the application’s dependencies and then install the dependencies. Finally, in a separate build step, we copy the local file that is most likely to change during the course of local development [main.py](<http://main.py>) into the image.

FROM python:3.10
WORKDIR /app
COPY ./src/web_api/requirements.txt .
RUN pip install -r requirements.txt
COPY ./src/web_api/main.py .
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

By taking this approach, we can avoid the need to download and install all the application dependencies, and instead use a cached layer that already contains them, and then copy the locally changed files into the cached image layer — thus dramatically reducing the time it takes for a given image to be built.

Initial build:

jeff@Jeffs-Air web_api % docker build .
[+] Building 58.8s (10/10) FINISHED                                                                            
=> [internal] load build definition from Dockerfile                                                       0.0s
=> => transferring dockerfile: 218B                                                                       0.0s
=> [internal] load .dockerignore                                                                          0.0s
=> => transferring context: 2B                                                                            0.0s
=> [internal] load metadata for docker.io/library/python:3.10                                             0.4s
=> [internal] load build context                                                                          0.0s
=> => transferring context: 5.55kB                                                                        0.0s
=> [1/5] FROM docker.io/library/python:3.10@sha256:14b683d63e171ad811b5ff3d55d3a1138f34bb324e2e222fe37b  51.9s
=> => resolve docker.io/library/python:3.10@sha256:14b683d63e171ad811b5ff3d55d3a1138f34bb324e2e222fe37b6  0.0s
=> => sha256:26c861b53509d61c37240d2f80efb3a351d2f1d7f4f8e8ec2e5004c1d86af89c 10.87MB / 10.87MB           8.4s
=> => sha256:1e5abeb51064e5c8d93027e4c9d3328827b3ba84f0a84d04801650bdec93abac 2.22kB / 2.22kB             0.0s
=> => sha256:7971239fe1d69763272ccc0b2527efa95547d37c53630ed0a71db4e00d3ef964 5.15MB / 5.15MB             1.3s
=> => sha256:8022b074731d9ecee7f4fba79b993920973811dda168bbc08636f18523b90122 53.70MB / 53.70MB          20.1s
=> => sha256:14b683d63e171ad811b5ff3d55d3a1138f34bb324e2e222fe37b6eab121a8517 2.14kB / 2.14kB             0.0s
=> => sha256:2012a8e6dfd8a16b02847c1a20762a4b363e6650c791cb7eda8b177b4ee1f56f 7.92kB / 7.92kB             0.0s
=> => sha256:1714880ecc1c021a5f708f4369f91d3c2c53b998a56d563d0a9aa9be2488d794 54.68MB / 54.68MB          17.2s
=> => sha256:895a945a1f9ba441c2748501c4d46569edfbc2bfbdb9b47d41e753e752247fdc 189.73MB / 189.73MB        46.4s
=> => sha256:2a83fe9e3053a5f52e699f1078c2abeaad849bb81da9068f1aa541bb5673a21f 6.40MB / 6.40MB            20.0s
=> => sha256:6390a24d41bf3ec2e2b4ce64404496204b82daacae8044a117dd371ade9f3277 17.26MB / 17.26MB          24.3s
=> => sha256:cf1538b1f5d7391ee80648de900ceafad965584c6d998b1ead58019c741e5387 247B / 247B                20.3s
=> => extracting sha256:8022b074731d9ecee7f4fba79b993920973811dda168bbc08636f18523b90122                  1.3s
=> => sha256:80562cfbfd10b63f91a4dafe765caad7c66c6025acfa11638dabde9b1d6af90c 3.08MB / 3.08MB            22.2s
=> => extracting sha256:7971239fe1d69763272ccc0b2527efa95547d37c53630ed0a71db4e00d3ef964                  0.1s
=> => extracting sha256:26c861b53509d61c37240d2f80efb3a351d2f1d7f4f8e8ec2e5004c1d86af89c                  0.1s
=> => extracting sha256:1714880ecc1c021a5f708f4369f91d3c2c53b998a56d563d0a9aa9be2488d794                  1.4s
=> => extracting sha256:895a945a1f9ba441c2748501c4d46569edfbc2bfbdb9b47d41e753e752247fdc                  4.2s
=> => extracting sha256:2a83fe9e3053a5f52e699f1078c2abeaad849bb81da9068f1aa541bb5673a21f                  0.2s
=> => extracting sha256:6390a24d41bf3ec2e2b4ce64404496204b82daacae8044a117dd371ade9f3277                  0.4s
=> => extracting sha256:cf1538b1f5d7391ee80648de900ceafad965584c6d998b1ead58019c741e5387                  0.0s
=> => extracting sha256:80562cfbfd10b63f91a4dafe765caad7c66c6025acfa11638dabde9b1d6af90c                  0.1s
=> [2/5] WORKDIR /app                                                                                     0.2s
=> [3/5] COPY requirements.txt .                                                                          0.0s
=> [4/5] RUN pip install -r requirements.txt                                                              6.0s
=> [5/5] COPY main.py .                                                                                   0.0s
=> exporting to image                                                                                     0.2s
=> => exporting layers                                                                                    0.2s
=> => writing image sha256:256eb122b7fc860649e77b218f67c51d316c94e1ad5326066daacaeb9ee42e21               0.0s

Subsequent build (after changing code in main.py):

jeff@Jeffs-Air web_api % docker build .
[+] Building 0.9s (10/10) FINISHED                                                                              
=> [internal] load build definition from Dockerfile                                                       0.0s
=> => transferring dockerfile: 37B                                                                        0.0s
=> [internal] load .dockerignore                                                                          0.0s
=> => transferring context: 2B                                                                            0.0s
=> [internal] load metadata for docker.io/library/python:3.10                                             0.8s
=> [internal] load build context                                                                          0.0s
=> => transferring context: 5.12kB                                                                        0.0s
=> [1/5] FROM docker.io/library/python:3.10@sha256:14b683d63e171ad811b5ff3d55d3a1138f34bb324e2e222fe37b6  0.0s
=> CACHED [2/5] WORKDIR /app                                                                              0.0s
=> CACHED [3/5] COPY requirements.txt .                                                                   0.0s
=> CACHED [4/5] RUN pip install -r requirements.txt                                                       0.0s
=> [5/5] COPY main.py .                                                                                   0.0s
=> exporting to image                                                                                     0.0s
=> => exporting layers                                                                                    0.0s
=> => writing image sha256:394b621c54089cd23529fae173eb47ff6bc591ee1c745811fa93cb81f12b86e5               0.0s

Conclusion

Docker images, the basis for Docker containers, are built in stages. Each build stage results in a locally cached intermediate layer of the image. Above, we looked at ways in which you can leverage this build and cache process to reduce build time for local development.

We also looked at ways that you can reduce the overall size of your image in order to reduce your cloud footprint, and thus increase performance and reduce the cost of hosting your Docker container in a production environment.

Python class called ProcessVideo

Python class called ProcessVideo

Get started with Velocity