cc-oci-runtime icon indicating copy to clipboard operation
cc-oci-runtime copied to clipboard

Issues starting a container in Fedora 25 Alpha (oci-systemd-hook prestart hook fails).

Open pooya opened this issue 9 years ago • 6 comments

This is a vanilla Fedora 25 installation with kernel 4.8.0-0.rc6.git0.1.fc25.x86_64 and docker docker-1.12.1-12.git9a3752d.fc25.x86_64

Following the procedure for Fedora 24, I ended up with the following error from containerd: Sep 18 12:23:55 localhost.localdomain dockerd[11628]: time="2016-09-18T12:23:55.225919746-07:00" level=error msg="Create container failed with error: containerd: container not started"

Looking in the earlier logs I see: Sep 18 12:19:48 localhost.localdomain oci-systemd-hook[11436]: systemdhook <error>: root not found in state: Success

Which points to the systemd hook in /usr/libexec/oci/hooks.d/oci-systemd-hook (https://github.com/projectatomic/oci-systemd-hook/blob/master/src/systemdhook.c#L659)

Removing that (optional) executable allows the container creation to make progress (the container still errors out immediately but that looks like a separate issue).

pooya avatar Sep 18 '16 20:09 pooya

Hi @pooya - thanks for reporting this. Yes, I think you are correct - if you enable debugging [1], you can see that the runtime is unhappy with the oci-systemd-hook hook.

Whilst we investigate further, the simplest solution to get you going is to do the following:

  • Remove the Fedora packaged docker
$ sudo systemctl stop docker
$ sudo dnf -y remove docker
  • Install the docker package from Docker as shown on the wiki here:

    https://github.com/01org/cc-oci-runtime/wiki/Installing-Clear-Containers-on-Fedora-24#2install-docker-112


[1] - https://github.com/01org/cc-oci-runtime#running-under-docker

jodh-intel avatar Sep 19 '16 10:09 jodh-intel

Closer inspection reveals the following:

  • runc passes some state to the hooks including a root element.

    When the hook doesn't see that root element, it generates the systemdhook <error>: root not found in state error.

  • oci-systemd-hook is hard-coded to look for this root element when the runtime passes it the hook. Enabling debug for the runtime shows the following in the logs:

    running hook command '/usr/libexec/oci/hooks.d/oci-systemd-hook' as:
    arg: '/usr/libexec/oci/hooks.d/oci-systemd-hook'
    arg: 'prestart'
    arg: '/var/lib/docker/containers/69184b7bf474f716fa8a9e0c43b9d15f03b7f20429a68a5d522ebac7869ed55e/config.v2.json'
    hook process ('/usr/libexec/oci/hooks.d/oci-systemd-hook') running with pid 2494
    Hook pid 2494 ended with exit status 256
    

    (Note: the runtime's state is passed to the hook over stdin and isn't logged).

The problem here is that runc, the reference implementation of the OCI spec, doesn't appear to be following the spec itself. Quoting from [1],

The state of the container is passed to the hooks over stdin, so the hooks could get the information they need to do their work.

If you look at [2], we find:

The state of a container MUST include, at least, the following properties:

  • ociVersion
  • id
  • status
  • pid
  • bundlePath
  • annotations

The crucial value here is bundlePath. However, looking at [3], we see that runc doesn't pass bundlePath - it passes Root (although just to be confusing, it does list bundlePath and rootfsPath for runc state $container calls).

In summary, this looks like 2 bugs:

  • runc is passing root rather than bundlePath to the hooks.
  • oci-systemd-hook is compounding the problem by requiring the erroneous root value.

[1] - https://github.com/opencontainers/runtime-spec/blob/master/config.md#hooks [2] - https://github.com/opencontainers/runtime-spec/blob/master/runtime.md#state [3] - https://github.com/opencontainers/runc/blob/master/libcontainer/process_linux.go#L302

jodh-intel avatar Sep 19 '16 13:09 jodh-intel

Removed bug label as this isn't a bug with cc-oci-runtime.

jodh-intel avatar Sep 19 '16 13:09 jodh-intel

If you don't need to run systemd inside your container, you can make use of the fedora-packaged version of docker (1.12.1-12.git9a3752d.fc25.x86_64) and simply remove the oci-systemd-hook package:

# install fedora-packaged version of docker
sudo dnf install docker

# remove systemd hook if you don't need to runt systemd in a container
sudo dnf remove oci-systemd-hook

jodh-intel avatar Sep 19 '16 15:09 jodh-intel

Thanks for the quick response. I am not running systemd in the container, so just manually removing the hook works for me.

pooya avatar Sep 21 '16 05:09 pooya

Keeping this issue open to remind us to track the two external bugs:

  • [ ] https://github.com/opencontainers/runc/issues/1057
  • [ ] https://github.com/projectatomic/oci-systemd-hook/issues/27

jodh-intel avatar Sep 27 '16 15:09 jodh-intel