Flattening Docker images
Docker images are stored as layers. To be more precise, the filesystem is layered, so each change (RUN, COPY,…) will result in adding a new layer. This approach has many advantages - images can be built upon other images and layers from these base images are shared. If you use many of them, you will find out that some of them are probably already downloaded so you don’t have to pull them again. And each layer is pretty much an image itself, so containers can be created from any layer you want.
In many cases, you probably don’t need to think about layers, unless you are peeling onions or watching a movie with that green orge. Or unless you have a rootless container on RHEL 7 with limited storage space. In this case, rootless containers are limited to VFS storage. Each created container will take a full size of the image - if the image has 5 GB, each container will have 5 GB.
What’s more - each layer is a deep copy of the previous one with added changes. As you can imagine, this can lead to ridiculous storage requirements. A 4 GB RStudio image may need more than 60 GB (probably even more than 120 GB) under VFS and each container will also need more than 60 GB. So squashing layers in this environment is a must.
We found different ways how to do it, but only our third attempt was successful.
#1 docker export
The command can be used to export a filesystem of container to a tar archive, but the output can be piped to a new image.
While this produces a new flat image with one layer, only the filesystem is copied to a new image. All environment variables and commands are lost (all metadata). So the approach is not very suitable for many images.
During a building process, --squash option can be specified. This requires a Docker daemon with experimental features turned on and it squashes only new layers, not the ones in the base image. The result is very specific and can lead to smaller images size. It won’t help you, if you just want to pull and use some image though.
This Python CLI tool may look dated and it’s not in active development, but it’s still alive and just works. You can clone the GitHub project or use pip (pip3 install docker-squash) to download scripts. After that, cli.py can be executed with additional options. The interface of the tool is clean and well documented. All layers (or only a specified number of them) are squashed to one and the rest of the image is preserved.
Squashing can take a while and it depends on image size. I was able to flatten a Jupyter image (6 GB, 93 layers) on a laptop with 4 cores and 16 GB RAM in 15 minutes, which is not a bad result. The resulting image can be loaded back to the daemon and it works exactly as original.
Author: Luděk Novotný