Union mount
Union mounting is a way of combining multiple file systems into one that appears to contain their combined contents. We have already touched on the topic of Union FS and Overlay FS in the previous post, How-to build OCI Image by hands, now let’s dig a little deeper.
The term file system is overloaded with many meanings:
- It could mean the Linux directory tree, e.g., root(/) or any subtree(/home).
- It could refer to a type of storage format, e.g., EXT3, NTFS, OverlayFS, etc.
- It could denote a partition or logical volume formatted with a specific type of file system.
So essentially union mounting is a way of combining multiple folders into one. I say folders here because linux has a single directory tree. All external disks, with any number of partitions, with any type of storage, are always mounted into it.
In Docker, the default storage driver is OverlayFS — a type of union file system. It allows the user to overlay one file system on top of another. Changes are recorded in the upper file system, while the lower file system remains unmodified.
mkdir /{upper,lower,working,merged}
mkdir /lower/dir1
touch /lower/dir1/file1.txt
mkdir /upper/dir2
touch /upper/dir2/file2.txt
mkdir /lower/dir3
mkdir /upper/dir3
touch /lower/dir3/file3.txt
touch /upper/dir3/file4.txt
/upper
and/lower
folders represent upper and lower file systems./working
folder is required by OverlayFS and must be empty./merged
— is where upper and lower file systems are unified.
To mount the file system run the following command as root:
mount -t overlay overlay -olowerdir=/lower,upperdir=/upper,workdir=/working /merged
Let’s see the results:
cd /merged
tree .
.
├── dir1
│ └── file1.txt
├── dir2
│ └── file2.txt
└── dir3
└── file3.txt
└── file4.txt
Looks good! Users must work in the /merged
folder. All changes are recorded in the upper file system. For example, when you edit the /dir1/file1.txt
it’s content is read from the lower file system and then the changed version is written into the upper file system.
But what will happen when we delete a file or a folder?
pwd
/merged
rm -rf dir1/
As expected dir1 is no longer present inside the merged folder. If we now check the /upper
file system we can see the /lower/dir1
is not a folder but a characted device.
ls -lia /upper
c--------- dir1
drwxr-xr-x dir2
drwxr-xr-x dir3
In order to support rm
and rmdir
without changing the lower filesystem, an overlay filesystem needs to record in the upper filesystem
that files have been removed. This is done using whiteouts and opaque directories.
A whiteout is created as a character device with 0/0 device number. When a whiteout is found in the upper level of a merged directory, any matching name in the lower level is ignored, and the whiteout itself is also hidden.
A directory is made opaque by setting the xattr trusted.overlay.opaque
to "y"
. Where the upper filesystem contains an opaque directory, any directory in the lower filesystem with the same name is ignored.
Here’s how to work with opaque xattr:
umount /merged
setfattr -n trusted.overlay.opaque -v y /upper/dir3
mount -t overlay overlay -olowerdir=/lower,upperdir=/upper,workdir=/working /merged
cd /merged
tree .
.
├── dir2
│ └── file2.txt
└── dir3
└── file4.txt
Whiteouts in OCI images
The OCI image spec says how we must work with whiteouts.
- A whiteout file is an empty file with a special filename that signifies a path should be deleted.
- A whiteout filename consists of the prefix
.wh.
plus the basename of the path to be deleted. - As files prefixed with
.wh.
are special whiteout markers, it is not possible to create a filesystem which has a file or directory with a name beginning with.wh.
. - Once a whiteout is applied, the whiteout itself MUST also be hidden.
- Whiteout files MUST only apply to resources in lower/parent layers.
- Files that are present in the same layer as a whiteout file can only be hidden by whiteout files in subsequent layers.
- In addition to expressing that a single entry should be removed from a lower layer, layers may remove all of the children using an opaque whiteout entry.
- An opaque whiteout entry is a file with the name
.wh..wh..<dir_name>
indicating that all siblings are hidden in the lower layer.
FROM alpine # layer 1
RUN mkdir -p /app/src # layer 2
RUN touch /app/src/main.go # layer 3
RUN rm -rf /app/ # layer 4
If we take the Dockerfile above, build it, and unpack the layers, we will se the following situation:
tree . -a
.
├── dockerfile
├── oci
│ ├── .wh.app
│ ├── app
│ │ └── src
│ │ └── main.go
│ ├── blobs
│ ├── index.json
│ └── oci-layout
└── oci.tar
Both /app/src
and /app/src/main.go
are there as they are added in the layers (2) and (3), but since we removed the /app
in layer (4) we have a whiteout file .wh.app
created.
References
- https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/7.2_release_notes/technology-preview-file_systems
- https://opensource.com/life/16/10/introduction-linux-filesystems
- https://www.kernel.org/doc/html/latest/filesystems/overlayfs.html#whiteouts-and-opaque-directories
- https://github.com/opencontainers/image-spec/issues/24
- https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt
- https://github.com/opencontainers/image-spec/blob/main/layer.md#whiteouts
This post is part of a series.
- Part 1: Container build tool
- Part 2: How-to build OCI Image by hands
- Part 3: Building OCI images with Go. No run command yet
- Part 4: How to Tar/Untar container layers in Go
- Part 5: Linux kernel namespaces
- Part 6: Mini container runtime in Go
- Part 7: Union mount