Experimental per-container VM adaptor for runc
that runs Linux containers on macOS by delegating Linux-specific operations to a tiny guest VM.
Warning
These docs are largely under-construction. They serve primarily as a way to organize my own thoughts. If you are interested in learning more, please contact me directly.
- This project exploring how far a native macOS
containerd
stack can go by offloading Linux-only behavior to a guest VM. - Think of runm as “
runc
over vsock”: the Linux-only bits run inside a tiny VM - everything else remains standardcontainerd
plumbing on the host. - The design mirrors industry patterns (Kata Containers; Apple’s Containerization) while integrating with
containerd
andnerdctl
natively on macOS.
nerdctl run/exec
support is solid and is by far the most well-testedcontainerd
'snative-snapshotter
currently requires a FUSE workaround viabindfs
.- BuildKit and multi-container/pod semantics are not implemented here yet (e.g., a K8s “pod” would require grouping multiple containers into one VM, Kata-style).
- ✅
nerdctl run
andnerdctl exec
- ✅
nerdctl
container management ⚠️ nerdctl build
: buildkit integration works, not much else- 🚧
ctr
: untested - 🚧
kubectl
: untested, needs pod support
- ✅
containerd
- ✅
nerdctl
- ✅
buildctl
- 🚧
buildkitd
: it runs and functions; I haven't completed a successful test yet
- exit status returned to the host
- bind mounts
- read-only mounts (via
ro=true
) -
-d
detached mode - internet access (e.g., reach google.com)
-
-it
interactive mode -
-e
environment variables -
-w
working directory -
-v
volumes -
-p
ports -
--tty
pty passthrough - []
-u
user
macFUSE
(kext)bindfs
Virtualization.framework
docker
go
iTerm2
(Terminal.app also works, but all logs are enhanced for iTerm)
This will run a scenario that creates a container via nerdctl run -d
, then runs a command via nerdctl exec
that streams stdio back to the host at 1s intervals.
Note
your password may be required to clean up any processes that have leaked.
- Clone this repo
- install all required forks to
../
(go tool task fork:install:all
) - Open two panes in iTerm2
- In the first pane, run
go tool task dev:containerd
- Take a break, it will take a while on the first run to build the kernel
- Wait until
containerd
starts and the logs stop - In the second pane, run
go tool task dev:2025-07-05:01
- The second pane will show user-facing logs streaming from the demo container
There are many forks that are required to run this project in its current state. The changes are a mix of required logical changes and (mostly) debugging. Until I have some time to put more TLC into them, I will lay out the important logical changes and what they enable.
Note
current nerdctl and BuildKit logic indirectly assume their binaries are built on the same OS as the container runtime. The "[bug]" notes below refer to this assumption—even though it's not truly an upstream bug.
Last upstream sync: 2025-08-06
-
containerd
(diff)-
[bind mounts] add
darwin
build support for thecore/mount
package by invokingbindfs
viaexec.Cmd
-
[rootless] ignore
EPERM
fromLchown
/Chown
-
-
nerdctl
(diff)-
[bug] on
darwin/arm64
, useoci.WithDefaultSpecForPlatform("linux/arm64")
when creating the OCI spec to prevent various "not supported" errors -
[bug] refactor the mount parsing logic to use Linux-specific logic on macOS
-
-
buildkit
(diff)- [bug] on
darwin/arm64
, generate OCI spec with explicit platform:GenerateSpecWithPlatform(ctx, nil, "linux/arm64", ...)
to avoid missing.Process.Args
inside the OCI spec
- [bug] on
-
fsutil
(diff)- [rootless] ignore
EPERM
fromLchown
/Chown
- [rootless] ignore
-
gvisor-tap-vsock
(diff)-
[feature] add raw
net.Listener
port-forwarding support -
[context/reliability] pass context through missing places; return
ctx.Err()
on cancellation
-
- Host: a
containerd
shim on macOS - Guest: a tiny Linux VM runs unmodified
runc
over gRPC/vsock - IO/control: stdio, signals, exit codes flow over vsock
- Isolation: one micro‑VM per container (no shared kernel)
- Networking: per‑VM networking via gvisor‑tap‑vsock
To containerd
, it looks like a normal OCI runtime; Linux syscalls execute inside the guest.
Rough outline
sequenceDiagram
participant U as nerdctl
participant CD as containerd
participant SH as runm shim (host)
participant VF as Virtualization.framework
participant VM as runm guest (VM)
participant R as runc
participant P as container process
U->>CD: run
CD->>SH: create/start task
SH->>VF: boot micro-VM
VF->>SH: VM ready
SH->>VM: gRPC Create/Start
VM->>R: runc create/start
R->>P: exec process
P->>VM: stdout/stderr, exit
VM->>SH: IO and status
SH->>CD: task state
CD->>U: result
Important
The below docs are even more so under construction and incomplete.
macOS does not provide Linux namespaces, cgroups, or overlayfs. Traditional solutions (e.g., Docker Desktop) run a single Linux VM hosting all containers. Runm explores a Kata-like “VM-per-container” design on macOS using Apple’s Virtualization.framework: stronger isolation, clean lifecycles, and a native integration path. The approach aligns with Apple’s Containerization framework direction while remaining OCI- and containerd-oriented.
- Host shim: adapted from
containerd
’scontainerd-shim-runc-v2
. It translates container lifecycle requests into gRPC calls to the guest. - Guest agent: receives requests over vsock and runs
runc
inside the VM. No changes torunc
are required beyond extra debug. - IO and control: stdio, signals, and exit codes traverse vsock; networking for the guest is provided via gvisor-tap-vsock style forwarding.
- Like Kata Containers, runm boots a minimal guest and runs the workload inside that VM for stronger isolation.
- Like Apple’s Containerization, runm embraces one-VM-per-container on macOS using Virtualization.framework for fast boots and tight integration.
- Unlike Kata’s K8s-focused “one VM per pod” model, runm currently treats each
nerdctl run
as its own sandbox VM.
Runm offloads Linux operations to a lightweight VM launched with Virtualization.framework.
- Custom Linux kernel configuration lives in
./linux/kernel
. - Static
busybox
is built in./linux/busybox
.
File system layout:
# initramfs
/init (symlink to `/runm-linux-mounter`)
/runm-linux-mounter
/bin/busybox
runm-linux-mounter
is the only binary in the initramfs; it mounts the mbin
squashfs containing the remaining guest binaries.
# rootfs
/bin/busybox
/mbin/runm-linux-init
/mbin/runc-test
/mbin/runm-linux-host-fork-exec-proxy
On Linux, containerd
’s native snapshotter relies on bind mounts (mount --bind
). macOS has no mount --bind
, so we use bindfs
and a FUSE implementation to simulate bind mounts:
bindfs
(OSS, GPL-2) + eithermacFUSE
orfuse-t
(free to use, closed-source) enable host-side bind-like behavior for the native snapshotter.macFUSE
kext requires reduced security mode;fuse-t
avoids a kext but has proven unstable in practice.- macOS 15 introduced FSKit for user-space file systems;
macFUSE
v5 advertises support, but it has not worked out-of-the-box here yet.
Notes from experimentation:
fuse-t
was significantly less stable (e.g., sporadic missingglibc
files breaking dynamic linking).macFUSE (kext)
has been much more reliable.