How to Tar/Untar container layers in Go

09/01/2024 - Estimated reading time: 6 minutes — Building a tool to build OCI container images

Tar is archival, or “container” format, meaning it does not provide compression. People use additional tools like Bzip2 or Gzip to compress it.

Logically speaking, a tar file is linear sequence of entries. An entry consists of a header and file body. Physically, a tar file consists of a sequence of zero-padded fixed-size blocks.

Source https://techoverflow.net/2013/03/29/reading-tar-files-in-c/

Code is located on github.

How to Tar a directory

Instead of writing an archive directly to a file let’s use io.Pipe func to create a synchronous in-memory pipe. This will provide us with more flexibility later when we need to calculate a digest.

import "io"

pipeReader, pipeWriter := io.Pipe()

As mentioned earlier tar is an archival format, so we need to compress it separately. Go provides compress/gzip package to handle compression.

import "compress/gzip"

compressed := gzip.NewWriter(pipeWriter), nil

Finally we create a tar writer.

import "archive/tar"

tr := tar.NewWriter(compressed)

Now it’s a matter of using filepath.WalkDir to list all the files in a directory and add them to our archive. However, there is one caveat - we must add files to the archive in a goroutine. This is because PipeWriter blocks until it has satisfied one or more reads from the PipeReader. If no one reads from the pipe, the write operation will block.

filepath.WalkDir accepts a source directory to walk and a function to call for each file or directory in the filesystem tree.

In order to populate a tar archive we need to do two things:

Call tr.WriteHeader to write file metadata.
Write the file itself to tr.

Header is a structure that contains metadata about the file:

file name,
permission and mode bits,
size,
owner’s user id and group id,
checksum, file type, etc.

archive/tar package also provides a helper function tar.FileInfoHeader to create header struct from fs.FileInfo object.

go func(){
  if err := filepath.WalkDir(src, func(path string, d os.DirEntry, err error) error {
    if err != nil {
      return fmt.Errorf("walk: %w", err)
    }

    relPath, err := filepath.Rel(src, path)
    if err != nil {
      return fmt.Errorf("rel: %w", err)
    }

    if relPath == "." {
      return nil
    }

    fi, err := os.Lstat(path)
    if err != nil {
      return fmt.Errorf("lstat: %w", err)
    }

    var link string
    if fi.Mode()&os.ModeSymlink != 0 {
      link, err = os.Readlink(path)
      if err != nil {
        return fmt.Errorf("readlink: %w", err)
      }
    }

    header, err := tar.FileInfoHeader(fi, link)
    if err != nil {
      return fmt.Errorf("header: %w", err)
    }

    header.Name = relPath

    if err := tr.WriteHeader(header); err != nil {
      return fmt.Errorf("write header: %w", err)
    }

    if header.Typeflag == tar.TypeReg {
      file, err := os.Open(path)
      if err != nil {
        return fmt.Errorf("open: %w", err)
      }

      _, err = io.Copy(tr, file)
      file.Close()
      if err != nil {
        return fmt.Errorf("copy: %w", err)
      }
    }

    return nil

  }); err != nil {
    fmt.Println(err)
  }
}()

I won’t post the full source code here as it’s too big, but you can find it on github, link is at the top.

Now, in order to write an archive to a file we can io.Copy the pipe reader. Assuming our tar-ing function is called Tar and it returns a pipe reader we can use the following code:

import (
  "os"
  "io"
)

var src = "/var/tmp/nginx"
var dst = "nginx.tar.gz"

file, err := os.Create(dst)
if err != nil {
  panic("creating dst file")
}

reader, err := Tar(src)
if err != nil {
  panic("creating dst file")
}

if _, err := io.Copy(file, reader); err != nil {
  return err
}

reader.Close()
file.Close()

We can also use go-digest package to compute the archive’s digest while we saving it.

import (
  "io"

  "github.com/opencontainers/go-digest"
)

compressedDigester := digest.Canonical.Digester()

...

mWriter := MultiWriter(compressedDigester.Hash(), file)

...

if _, err := io.Copy(mWriter, reader); err != nil {
  return err
}

Here, instead of writing the data directly to the file, we use an io.MultiWriter instance to write it to both the file and our hashing object.

How to Untar an archive

In order to unpack an archive we need to know how it was compressed. We can find the compression method by inspecting the first bytes of the archive.

Zip (.zip) starts with 0x50, 0x4b, 0x03, 0x04
Gzip (.gz) starts with 0x1f, 0x8b, 0x08
bzip2 (.bz2) starts with 0x42, 0x5a, 0x68

type Compression int

const (
	Uncompressed Compression = iota
	Bzip2
	Gzip
)

func DetectCompression(source []byte) Compression {
	for compression, m := range map[Compression][]byte{
		Bzip2: {0x42, 0x5A, 0x68},
		Gzip:  {0x1F, 0x8B, 0x08},
	} {
		if len(source) < len(m) {
			continue
		}
		if bytes.Equal(m, source[:len(m)]) {
			return compression
		}
	}
	return Uncompressed
}

The DecompressStream func take io.Reader instance, detects file compression and creates the appropriate reader.

import (
  "io"
  "bufio"
  "bytes"
  "compress/bzip2"
  "compress/gzip"
)

func DecompressStream(src io.Reader) (io.Reader, Compression, error) {
	buffer := bufio.NewReader(src)
	sig, err := buffer.Peek(10)

	if err != nil {
		return nil, Uncompressed, err
	}

	compression := DetectCompression(sig)

	switch compression {
	case Uncompressed:
		return buffer, Uncompressed, nil
	case Bzip2:
		return bzip2.NewReader(buffer), Bzip2, err
	case Gzip:
		gzipReader, err := gzip.NewReader(buffer)
		if err != nil {
			return nil, Gzip, err
		}
		return gzipReader, Gzip, nil
	default:
		return nil, Uncompressed, fmt.Errorf("unsupported compression: %d", compression)
	}
}

The untarring function is simple, so I’ll include its full code. We first start by decompressing the input stream, an io.Reader instance. We then instantiate a new tar reader by calling tar.NewReader. Afterwards, we call tr.Next in a loop to move to the next file in the archive.

import (
  "os"
  "io"
  "archive/tar"
  "path/filepath"
  "strings"
)

func Untar(src io.Reader, dst string) error {
	decompressed, _, err := DecompressStream(src)
	if err != nil {
		return err
	}

	tr := tar.NewReader(decompressed)
	for {
		header, err := tr.Next()
		if err == io.EOF {
			break
		}
		if err != nil {
			return fmt.Errorf("tar read: %w", err)
		}

		header.Name = filepath.Clean(header.Name)

		path := filepath.Join(dst, header.Name)
		fi := header.FileInfo()
		mask := fi.Mode()

		switch header.Typeflag {
		case tar.TypeDir:
			if fi, err := os.Lstat(path); !(err == nil && fi.IsDir()) {
				if err := os.MkdirAll(path, mask); err != nil {
					return fmt.Errorf("mkdir: %w", err)
				}
			}
		case tar.TypeReg:
			file, err := os.OpenFile(path, os.O_CREATE|os.O_WRONLY, mask)
			if err != nil {
				return fmt.Errorf("open: %w", err)
			}
			if _, err := io.Copy(file, tr); err != nil {
				file.Close()
				return fmt.Errorf("copy: %w", err)
			}
			file.Close()
		case tar.TypeSymlink:
			targetPath := filepath.Join(filepath.Dir(path), header.Linkname)
			if !strings.HasPrefix(targetPath, dst) {
				return fmt.Errorf("symlink: %w", err)
			}
			if err := os.Symlink(header.Linkname, path); err != nil {
				return fmt.Errorf("symlink: %w", err)
			}
		default:
			return fmt.Errorf("unsupported type: %d", header.Typeflag)
		}
	}

	return nil
}

This post is part of a series.

Part 1: Container build tool
Part 2: How-to build OCI Image by hands
Part 3: Building OCI images with Go. No run command yet
Part 4: How to Tar/Untar container layers in Go
Part 5: Linux kernel namespaces
Part 6: Mini container runtime in Go
Part 7: Union mount