launch systemd service within OCI container (runc)
Currently, I am trying to launch systemd service (avahi-daemon) within RUNC container and all of my attempts were failed. I faced several articles for the same task but for the docker solution and one more. Does anybody have a successful experience with the same task?
This is my config.json:
{
"ociVersion": "1.0.0-rc1",
"platform": { "os": "linux", "arch": "arm"
},
"process": { "terminal": false, "user": { "uid": 0, "gid": 0 }, "args": [ "/bin/systemctl", "start", "avahi-daemon" ], "env": [ "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "TERM=xterm" ], "cwd": "/", "capabilities": { "bounding": [ "CAP_AUDIT_WRITE", "CAP_KILL", "CAP_NET_RAW", "CAP_SYS_ADMIN", "CAP_NET_BIND_SERVICE" ], "effective": [ "CAP_AUDIT_WRITE", "CAP_KILL", "CAP_NET_RAW", "CAP_SYS_ADMIN", "CAP_NET_BIND_SERVICE" ], "inheritable": [ "CAP_AUDIT_WRITE", "CAP_KILL", "CAP_NET_RAW", "CAP_SYS_ADMIN", "CAP_NET_BIND_SERVICE" ], "permitted": [ "CAP_AUDIT_WRITE", "CAP_KILL", "CAP_NET_RAW", "CAP_SYS_ADMIN", "CAP_NET_BIND_SERVICE" ], "ambient": [ "CAP_AUDIT_WRITE", "CAP_KILL", "CAP_NET_RAW", "CAP_SYS_ADMIN", "CAP_NET_BIND_SERVICE" ] }, "rlimits": [ { "type": "RLIMIT_NOFILE", "hard": 1024, "soft": 1024 } ], "noNewPrivileges": true
},
"root": { "path": "rootfs", "readonly": false },
"hostname": "runc",
"mounts": [ { "destination": "/proc", "type": "proc", "source": "proc" }, { "destination": "/dev", "type": "tmpfs", "source": "tmpfs", "options": [ "nosuid", "strictatime", "mode=755", "size=65536k" ] }, { "destination": "/dev/pts", "type": "devpts", "source": "devpts", "options": [ "nosuid", "noexec", "newinstance", "ptmxmode=0666", "mode=0620", "gid=5" ] }, { "destination": "/dev/shm", "type": "tmpfs", "source": "shm", "options": [ "nosuid", "noexec", "nodev", "mode=1777", "size=65536k" ] }, { "destination": "/dev/mqueue", "type": "mqueue", "source": "mqueue", "options": [ "nosuid", "noexec", "nodev" ] }, { "destination": "/sys", "type": "sysfs", "source": "sysfs", "options": [ "nosuid", "noexec", "nodev", "ro" ] }, { "destination": "/sys/fs/cgroup", "type": "cgroup", "source": "cgroup", "options": [ "ro" ] }
],
"linux": { "resources": { "devices": [ { "allow": false, "access": "rwm" } ] }, "namespaces": [ { "type": "network" }, { "type": "ipc" }, { "type": "uts" }, { "type": "mount" } ], "maskedPaths": [ "/proc/kcore", "/proc/latency_stats", "/proc/timer_stats", "/proc/sched_debug" ], "readonlyPaths": [ "/proc/asound", "/proc/bus", "/proc/fs", "/proc/irq", "/proc/sys", "/proc/sysrq-trigger" ]
}This config file releases an error: "Failed to connect to bus: No such file or directory".
During my attempts I have tried to:
- Assign capabilities CAP_SYS_ADMIN to the container;
- Execute "/sbin/init" binary at the container startup and got an error: "Couldn't find an alternative telinit implementation to spawn.";
- The init file is a symbolic link to "/lib/systemd/systemd", so I also have tried to use this script directly and also got an error: "Trying to run as user instance, but the system has not been booted with systemd.".
2 Answers
systemd services do not run standalone – you can only start them if your pid 1 (init) is systemd. In containers, that requires using a pid namespace in addition to what you already have.
(In other words, systemctl doesn't actually read and execute those .service files at all – it only asks pid 1 to start the corresponding daemon.)
In general, I'd say your runC setup already duplicates systemd's built-in features (ProtectHome=, CapabilityBoundingSet=, etc.) But if you do want to run the daemon in a dedicated container, you only have two options:
Run the container with a new PID namespace, with systemd as its main process, and have that systemd instance start avahi-daemon. (systemd-nspawn may work better than runC.)
Configure the container to launch /usr/bin/avahi-daemon directly, without involving systemctl or the avahi-daemon.service files at all.
My version of the config.json file for the systemd within runc:
{
"ociVersion": "1.0.0-rc1",
"platform": { "os": "linux", "arch": "arm"
},
"process": { "terminal": false, "user": { "uid": 0, "gid": 0 }, "args": [ "/sbin/init" ], "env": [ "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "TERM=xterm" ], "cwd": "/", "capabilities": { "bounding": [ "CAP_KILL", "CAP_CHOWN", "CAP_SETGID", "CAP_SETUID", "CAP_NET_RAW", "CAP_MAC_ADMIN", "CAP_SYS_ADMIN", "CAP_SYS_CHROOT", "CAP_AUDIT_WRITE", "CAP_NET_BIND_SERVICE" ], "effective": [ "CAP_KILL", "CAP_CHOWN", "CAP_SETGID", "CAP_SETUID", "CAP_NET_RAW", "CAP_MAC_ADMIN", "CAP_SYS_ADMIN", "CAP_SYS_CHROOT", "CAP_AUDIT_WRITE", "CAP_NET_BIND_SERVICE" ], "inheritable": [ "CAP_KILL", "CAP_CHOWN", "CAP_SETGID", "CAP_SETUID", "CAP_NET_RAW", "CAP_MAC_ADMIN", "CAP_SYS_ADMIN", "CAP_SYS_CHROOT", "CAP_AUDIT_WRITE", "CAP_NET_BIND_SERVICE" ], "permitted": [ "CAP_KILL", "CAP_CHOWN", "CAP_SETGID", "CAP_SETUID", "CAP_NET_RAW", "CAP_MAC_ADMIN", "CAP_SYS_ADMIN", "CAP_SYS_CHROOT", "CAP_AUDIT_WRITE", "CAP_NET_BIND_SERVICE" ], "ambient": [ "CAP_KILL", "CAP_CHOWN", "CAP_SETGID", "CAP_SETUID", "CAP_NET_RAW", "CAP_MAC_ADMIN", "CAP_SYS_ADMIN", "CAP_SYS_CHROOT", "CAP_AUDIT_WRITE", "CAP_NET_BIND_SERVICE" ] }, "rlimits": [ { "type": "RLIMIT_NOFILE", "hard": 1024, "soft": 1024 } ], "noNewPrivileges": true
},
"root": { "path": "rootfs", "readonly": false },
"hostname": "runc",
"mounts": [ { "destination": "/proc", "type": "proc", "source": "proc", "options": [ "nosuid", "noexec", "nodev" ] }, { "destination": "/dev", "type": "tmpfs", "source": "tmpfs", "options": [ "nosuid", "strictatime", "mode=755" ] }, { "destination": "/dev/pts", "type": "devpts", "source": "devpts", "options": [ "nosuid", "noexec", "newinstance", "ptmxmode=0666", "mode=0620", "gid=5" ] }, { "destination": "/dev/shm", "type": "tmpfs", "source": "shm", "options": [ "nosuid", "noexec", "nodev", "mode=1777", "size=65536k" ] }, { "destination": "/dev/mqueue", "type": "mqueue", "source": "mqueue", "options": [ "nosuid", "noexec", "nodev" ] }, { "destination": "/sys", "type": "sysfs", "source": "sysfs", "options": [ "nosuid", "noexec", "nodev" ] }, { "destination": "/sys/fs/cgroup", "type": "bind", "source": "/sys/fs/cgroup", "options": [ "rbind", "ro" ] }
],
"linux": { "resources": { "devices": [ { "allow": false, "access": "rwm" } ] }, "namespaces": [ { "type": "pid" }, { "type": "network" }, { "type": "ipc" }, { "type": "uts" }, { "type": "mount" } ], "maskedPaths": [ "/proc/kcore", "/proc/latency_stats", "/proc/timer_stats", "/proc/sched_debug" ], "readonlyPaths": [ "/proc/asound", "/proc/bus", "/proc/fs", "/proc/irq", "/proc/sys", "/proc/sysrq-trigger" ]
}