Platon on Cloud

Union mount

Building a tool to build OCI container images

Union mounting is a way of combining multiple file systems into one that appears to contain their combined contents. We have already touched on the topic of Union FS and Overlay FS in the previous post, How-to build OCI Image by hands, now let’s dig a little deeper.

The term file system is overloaded with many meanings:

  • It could mean the Linux directory tree, e.g., root(/) or any subtree(/home).
  • It could refer to a type of storage format, e.g., EXT3, NTFS, OverlayFS, etc.
  • It could denote a partition or logical volume formatted with a specific type of file system.

So essentially union mounting is a way of combining multiple folders into one. I say folders here because linux has a single directory tree. All external disks, with any number of partitions, with any type of storage, are always mounted into it.

In Docker, the default storage driver is OverlayFS — a type of union file system. It allows the user to overlay one file system on top of another. Changes are recorded in the upper file system, while the lower file system remains unmodified.

mkdir /{upper,lower,working,merged}

mkdir /lower/dir1
touch /lower/dir1/file1.txt

mkdir /upper/dir2
touch /upper/dir2/file2.txt

mkdir /lower/dir3
mkdir /upper/dir3
touch /lower/dir3/file3.txt
touch /upper/dir3/file4.txt
  • /upper and /lower folders represent upper and lower file systems.
  • /working folder is required by OverlayFS and must be empty.
  • /merged — is where upper and lower file systems are unified.

To mount the file system run the following command as root:

mount -t overlay overlay -olowerdir=/lower,upperdir=/upper,workdir=/working /merged

Let’s see the results:

cd /merged

tree .
.
├── dir1
│   └── file1.txt
├── dir2
│   └── file2.txt
└── dir3
    └── file3.txt
    └── file4.txt

Looks good! Users must work in the /merged folder. All changes are recorded in the upper file system. For example, when you edit the /dir1/file1.txt it’s content is read from the lower file system and then the changed version is written into the upper file system.

But what will happen when we delete a file or a folder?

pwd
/merged

rm -rf dir1/

As expected dir1 is no longer present inside the merged folder. If we now check the /upper file system we can see the /lower/dir1 is not a folder but a characted device.

ls -lia /upper
c---------   dir1
drwxr-xr-x   dir2
drwxr-xr-x   dir3

In order to support rm and rmdir without changing the lower filesystem, an overlay filesystem needs to record in the upper filesystem that files have been removed. This is done using whiteouts and opaque directories.

A whiteout is created as a character device with 0/0 device number. When a whiteout is found in the upper level of a merged directory, any matching name in the lower level is ignored, and the whiteout itself is also hidden.

A directory is made opaque by setting the xattr trusted.overlay.opaque to "y". Where the upper filesystem contains an opaque directory, any directory in the lower filesystem with the same name is ignored.

Here’s how to work with opaque xattr:

umount /merged

setfattr -n trusted.overlay.opaque -v y /upper/dir3

mount -t overlay overlay -olowerdir=/lower,upperdir=/upper,workdir=/working /merged

cd /merged

tree .
.
├── dir2
│   └── file2.txt
└── dir3
    └── file4.txt

Whiteouts in OCI images

The OCI image spec says how we must work with whiteouts.

  • A whiteout file is an empty file with a special filename that signifies a path should be deleted.
  • A whiteout filename consists of the prefix .wh. plus the basename of the path to be deleted.
  • As files prefixed with .wh. are special whiteout markers, it is not possible to create a filesystem which has a file or directory with a name beginning with .wh..
  • Once a whiteout is applied, the whiteout itself MUST also be hidden.
  • Whiteout files MUST only apply to resources in lower/parent layers.
  • Files that are present in the same layer as a whiteout file can only be hidden by whiteout files in subsequent layers.
  • In addition to expressing that a single entry should be removed from a lower layer, layers may remove all of the children using an opaque whiteout entry.
  • An opaque whiteout entry is a file with the name .wh..wh..<dir_name> indicating that all siblings are hidden in the lower layer.
FROM alpine                 # layer 1
RUN mkdir -p /app/src       # layer 2
RUN touch /app/src/main.go  # layer 3
RUN rm -rf /app/            # layer 4

If we take the Dockerfile above, build it, and unpack the layers, we will se the following situation:

tree . -a
.
├── dockerfile
├── oci
│   ├── .wh.app
│   ├── app
│   │   └── src
│   │       └── main.go
│   ├── blobs
│   ├── index.json
│   └── oci-layout
└── oci.tar

Both /app/src and /app/src/main.go are there as they are added in the layers (2) and (3), but since we removed the /app in layer (4) we have a whiteout file .wh.app created.

References

This post is part of a series.

comments powered by Disqus