Platon on Cloud

How-to build OCI Image by hands

Building a tool to build OCI container images

Open Container Initiative define standards, or specs, for containers. Back in the early days Docker had its own container and runtime formats.

rkt, container runtime created by CoreOS as alternative to Docker, also had it own container format. To make everything work interoperable people created OCI that define three specifications:

  • the Runtime Specification (runtime-spec),
  • the Image Specification (image-spec)
  • and the Distribution Specification (distribution-spec).

In this article we will explore the image spec with the goal of building our own OCI image without build tools.

Container image layers

Each container image consists of layers, or blobs, and configuration. If we take the contents of a file system and archive it we get a layer. Lets look at it in action by running docker save alpine > alpine.tar and then unpack the archive:

docker pull alpine

docker save alpine > alpine.tar

#unpack it
tar xvf alpine.tar

tree .
.
├── 3cc20332140056b331ad58185ab589c085f4e7d79d8c9769533d6a9b95d4b1b0.json
├── e863aefeb0c938ac2eb625d83bb2f5094568ba00a1ca521496a7a98f0e57ba27
│   ├── VERSION
│   ├── json
│   └── layer.tar
├── manifest.json
└── repositories

Inside e86...a27 folder there is layer.tar - lets unpack it as well.

tree .
.
├── bin
├── dev
├── etc
├── home
├── lib
├── media
├── mnt
├── opt
├── proc
├── root
├── run
├── sbin
├── srv
├── sys
├── tmp
├── usr
└── var

It looks like pretty standard linux file system directory tree.

Each layer is nothing more then a tar archive.

Content Addressable Storage

Content Addressable Storage, CAS, is a way of storing information so it can be retrieved based on its content, not its name or location. In CAS a file is assigned a unique value, a hash, that represents the content. This ensures that data is stored only once.

Usually we use file path to get its content. E.g. we can do cat /etc/abc/xyz.conf to get the content of the file. In container world we use content addressable identities to reference files.

#create regular file
echo 'hello' > content-addressable-identitiy

#get its hash
sha256sum < content-addressable-identitiy 
5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03 -

#rename the file
mv content-addressable-identitiy 5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03

cat 5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03
hello

Now we use file hash to get its content as opposed to using file path.

Layers, or blobs, are named after their SHA 256 sum.

Lets look at the contents of alpine.tar archive again and try to get SHA sum of the first file.

tree . 
.
├── 3cc20332140056b331ad58185ab589c085f4e7d79d8c9769533d6a9b95d4b1b0.json
├── e863aefeb0c938ac2eb625d83bb2f5094568ba00a1ca521496a7a98f0e57ba27
# truncated

sha256sum < 3cc20332140056b331ad58185ab589c085f4e7d79d8c9769533d6a9b95d4b1b0.json
3cc20332140056b331ad58185ab589c085f4e7d79d8c9769533d6a9b95d4b1b0  -

This approach is used to verify integrity. If someone tampered with a file its hash will change and the file won’t be used.

Union File System

We know that images consist of multiple layers. Those layers needs to be merged together to get a unified view of directory structure.

# dockerfile

FROM alpine
ADD hello /bin/hello

Now if we build this docker build . -t multilayers we will get an image with 2 layers:

  • base layer provided by alpine image,
  • and a layer containing only /bin/hello file.
docker image inspect multilayers | jq '.[0].RootFS'     
{
  "Type": "layers",
  "Layers": [
    "sha256:5f4d9fc4d98de91820d2a9c81e501c8cc6429bc8758b43fcb2cd50f4cab9a324",
    "sha256:86922ebcfe54c8c254f94a8624bbc7233ca6b3d644d0c97c384458508956c6c1"
  ]
}

Container engines rely on different storage drivers to perform layers merging. The default storage driver in docker is overlay2.

Overlay2 allows us to create multi-layered virtual file system, which means multiple layers are overlaid on top of each other. Virtual file system is created using lower and upper layers, each layer is basically a directory. Lower layers are ready-ony. Upper layers are read-write.

Merged layer is unified view of Lower and Upper layers
Merged layer is unified view of Lower and Upper layers

Let’s inspect the image and see the content under GraphDriver property.

docker image inspect multilayers | jq '.[0].GraphDriver' 
{
  "Data": {
    "LowerDir": "/var/lib/docker/overlay2/b54e0ad78336df458acd8bedb753790352df983a61f24915ae5059cfbdaa9a88/diff",
    "MergedDir": "/var/lib/docker/overlay2/kokkj58zxlqgfte7rbw5mcn4g/merged",
    "UpperDir": "/var/lib/docker/overlay2/kokkj58zxlqgfte7rbw5mcn4g/diff",
    "WorkDir": "/var/lib/docker/overlay2/kokkj58zxlqgfte7rbw5mcn4g/work"
  },
  "Name": "overlay2"
}

# look up content of UpperDir
tree . /var/lib/docker/overlay2/kokkj58zxlqgfte7rbw5mcn4g/diff
.
/var/lib/docker/overlay2/kokkj58zxlqgfte7rbw5mcn4g/diff
└── bin
    └── hello

We can see it contains only the data we added in our dockerfile.

It is said that each layer is a "diff" that contains only changes added to the lower layers.

All in all layers are nothing special. They are just blobs that are expanded into directory and then mounted together to get a unified view.

OCI Image Layout

The image we’ve seen before was not OCI compliant. What we saw was a docker image. Let’s fix that and see how OCI image looks like.

#simple dockerfile
cat dockerfile 
FROM alpine
CMD echo 'hello world!'

#build it using buildx
docker buildx build -o type=oci,dest=oci.tar .

#unpack it
tar xzf oci.tar

tree .
.
├── blobs
│   └── sha256
│       ├── 2ab6241fbe26fe4ce86dba1231bcd2dc73101e75b3f6627e0ca1d6ddfe206632
│       ├── 579b34f0a95bb83b3acd6b3249ddc52c3d80f5c84b13c944e9e324feb86dd329
│       └── ba692c6a83ea13339c26d640da9aaae69b635efb854be1a4a13e563e965c31a4
├── dockerfile
├── index.json
├── oci-layout
└── oci.tar

#You can ignore dockerfile and oci.tar as we created those.

The directory tree you see is called OCI Image Layout. Here’s what OCI image spec says about it:

  • The OCI Image Layout is the directory structure for OCI content-addressable blobs and location-addressable references (refs).
  • blobs directory contains content-addressable blobs.
  • oci-layout file contains JSON object with imageLayoutVersion field.
  • index.json is an image index, it contains manifests.

OCI Image format specification

The specification defines an OCI Image consisting of

  • image index, defined in index.json,
  • image manifest, defined inside index.json,
  • a set of filesystem layers(blobs),
  • and configuration, also stored as a blob.

Image index is simply a list of manifests.

# index.json 

cat index.json | jq
{
  "schemaVersion": 2,
  "manifests": [
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "digest": "sha256:ba692c6a83ea13339c26d640da9aaae69b635efb854be1a4a13e563e965c31a4",
      "size": 480,
      "annotations": {
        "org.opencontainers.image.created": "2023-10-26T15:55:20Z"
      },
      "platform": {
        "architecture": "arm64",
        "os": "linux"
      }
    }
  ]
}

The digest property is a hash of a manifest. And since we are using Content Addressable Storage we can get the content of a manifest by its hash.

# manifest

cat blobs/sha256/ba692c6a83ea13339c26d640da9aaae69b635efb854be1a4a13e563e965c31a4 | jq
{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "config": {
    "mediaType": "application/vnd.oci.image.config.v1+json",
    "digest": "sha256:2ab6241fbe26fe4ce86dba1231bcd2dc73101e75b3f6627e0ca1d6ddfe206632",
    "size": 809
  },
  "layers": [
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "digest": "sha256:579b34f0a95bb83b3acd6b3249ddc52c3d80f5c84b13c944e9e324feb86dd329",
      "size": 3331831
    }
  ]
}

A manifest defines list of layers our image consists of AND image configuration. At this point it should be really easy to get the content of configuration.

# configuration 

cat blobs/sha256/2ab6241fbe26fe4ce86dba1231bcd2dc73101e75b3f6627e0ca1d6ddfe206632 | jq
{
  "architecture": "arm64",
  "config": {
    "Env": [
      "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
    ],
    "Cmd": [
      "/bin/sh",
      "-c",
      "echo 'hello world!'"
    ],
    "ArgsEscaped": true,
    "OnBuild": null
  },
  "created": "2023-09-28T20:39:34.079909813Z",
  "history": [ ... ],
  "os": "linux",
  "rootfs": {
    "type": "layers",
    "diff_ids": [
      "sha256:5f4d9fc4d98de91820d2a9c81e501c8cc6429bc8758b43fcb2cd50f4cab9a324"
    ]
  }
}

As you can see the manifest configuration among other things defines environment variables, command and entrypoint. It also defines rootfs object that lists image layers under diff_ids property. We already know that each layer is a “diff” that contains only changes made to an image so naming here should be understood. We also know that we use CAS(Content Addressable Storage), so diff_ids is a list of hashes of our layers with one distinction that we use unpacked layer data to get the hash. This is used to verify data integrity. And this is different from digest of a layer inside a manifest, there hash of an archive(blob) is used.

OCI layout

Whenever you hear someone saying “multi platform image” it means that image index contains manifests for multiple platforms. Container repository tagging works the same way - each tag is basically a manifest pointing to a set of layers and configuration. We then use image index to perform lookups.

Getting to the point

We finally have all the knowledge we need to build our very own OCI image. To keep things simple i will include a single binary in the layer. You can use any binary you want, i’ll use Golang to create it. The reason i’m using Go is because it produces statically linked binary and has no dependencies.

package main

import "fmt"

func main() {
    fmt.Println("hello world")
}

To get the binary we need to build it.

go build hello-world.go

Now we need to created our layer’s directory.

mkdir -p layer/bin
cp hello-world ./layer/bin

tree layer
layer
└── bin
    └── hello-world

Now lets create OCI image layout.

mkdir -p image/blobs
touch image/index.json
touch image/oci-layout

tree .
.
└── image
    ├── blobs
    ├── index.json
    └── oci-layout

Lets start by creating a layer out of our layer directory we created above.

cd layer

tar -czvf ../layer.tar.gz *
a bin
a bin/hello-world

Now we should have our layer archived into layer.tar.gz. To use that layer inside out config file we need to get its unpacked hash -

gunzip < layer.tar.gz | sha256sum
918299505cdc628639a0c0ebc767aeab9419b680aaa3a738d2e5976b0bb6c4e0  -

This gets us unpacked layer’s hash we will use for diff_ids. We need to reference it inside config.

{
    "architecture": "arm64",
    "os": "linux",
    "config": {
      "Env": [
        "PATH=/bin"
      ],
      "Entrypoint": [
        "hello-world"
      ]
    },
    "rootfs": {
      "type": "layers",
      "diff_ids": [
        # unpacked layers hash goes here
        "sha256:918299505cdc628639a0c0ebc767aeab9419b680aaa3a738d2e5976b0bb6c4e0"
      ]
    },
    "history": [
      {
        "created_by": "Platon Korzh"
      }
    ]
  }

Put this content inside config.json file. The first thing we need to do is to get its hash.

sha256sum < config.json
c81fe38d5f0c3c91a7d0506244b413e9a76ed66b81af882575ada912ee7e9e1b - 

mv config.json image/blobs/sha256/c81fe38d5f0c3c91a7d0506244b413e9a76ed66b81af882575ada912ee7e9e1b

Now lets get the hash of our layer’s archive.

sha256sum < layer.tar.gz
6cdd002eef9eebc1c8d2f58dc67789b660ed4b3426c1492e65f9d4078cf76838 -

mv layer.tar.gz image/blobs/6cdd002eef9eebc1c8d2f58dc67789b660ed4b3426c1492e65f9d4078cf76838

The next step is to create image manifest.

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "config": {
    "mediaType": "application/vnd.oci.image.config.v1+json",
    "size": 460,
    # put config.json hash in digest property
    "digest": "sha256:c81fe38d5f0c3c91a7d0506244b413e9a76ed66b81af882575ada912ee7e9e1b"
  },
  "layers": [
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "size": 1125888,
      # put layer.tar.gz hash in digest property
      "digest": "sha256:6cdd002eef9eebc1c8d2f58dc67789b660ed4b3426c1492e65f9d4078cf76838"
    }
  ]
}

Inside config.digest we pass the hash of config file we created earlier. Inside layers[0].digest we pass the hash of our layer’s archive. Now we save that file as manifest.json and get its hash.

sha256sum < manifest.json
7ecc5d1e321c8143040f8fc57d24d7544408dd4276b5cc1b1ae2bc9652a7dd12 -

mv manifest.json image/blobs/7ecc5d1e321c8143040f8fc57d24d7544408dd4276b5cc1b1ae2bc9652a7dd12

At the end we should have the directory structure as follows.

tree .
.
├── blobs
│   └── sha256
│       ├── 6cdd002eef9eebc1c8d2f58dc67789b660ed4b3426c1492e65f9d4078cf76838
│       ├── 7ecc5d1e321c8143040f8fc57d24d7544408dd4276b5cc1b1ae2bc9652a7dd12
│       └── c81fe38d5f0c3c91a7d0506244b413e9a76ed66b81af882575ada912ee7e9e1b
├── index.json
└── oci-layout

At this point all we have to do is to put { "imageLayoutVersion": "1.0.0" } inside oci-layout file and write index file.

# index.json
{
  "schemaVersion": 2,
  "manifests": [
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      # reference manifest file by its hash
      "digest": "sha256:7ecc5d1e321c8143040f8fc57d24d7544408dd4276b5cc1b1ae2bc9652a7dd12",
      "size": 250,
      "platform": { "architecture": "arm64", "os": "linux" }
    }
  ]
}

And thats it! We have our OCI image ready! To test it lets upload it to docker hub by using skopeo tool.

skopeo copy oci:./ docker://platonkorzh/oci-image
Getting image source signatures
Copying blob 6cdd002eef9e done  
Copying config c81fe38d5f done  
Writing manifest to image destination

And the moment of truth:

docker run platonkorzh/oci-image
Unable to find image 'platonkorzh/oci-image:latest' locally
latest: Pulling from platonkorzh/oci-image
Digest: sha256:7ecc5d1e321c8143040f8fc57d24d7544408dd4276b5cc1b1ae2bc9652a7dd12
Status: Downloaded newer image for platonkorzh/oci-image:latest
hello world

This post is part of a series.

comments powered by Disqus