guardian icon indicating copy to clipboard operation
guardian copied to clipboard

gdn fail with runc error in ubuntu 2204 lts

Open xtremerui opened this issue 3 years ago • 3 comments

Description

When running Concourse binary (using gdn for containization) in google VM with ubuntu-2204-lts family as OS image, we see errors as below

Aug 25 21:56:12 smoke-splendid-earwig concourse[4460]: {"timestamp":"2022-08-25T21:56:12.809930620Z","level":"error","source":"guardian","message":"guardian.create.containerizer-create.runtime-create-failed","data":{"error":"runc run: exit status 1: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: rootfs_linux.go:76: mounting \"cgroup\" to rootfs at \"/sys/fs/cgroup\" caused: invalid argument","handle":"a17876d5-647e-492d-6ae2-311b1a56d718","session":"40.3"}}

For comparison, when running Concourse by docker compose locally we don't see the error. The OS image is the same as the VM in GCP

root@c29ddbf435bd:/src# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.1 LTS"

but is kernel is 5.10.47-linuxkit.

Also, when running Concourse with containerd runtime that directly using runc v1.1.4 we dont see error in both local docker or gcp VM.

Maybe it is related to the older runc that is currently used in guardian where it might not work well with specific newer kernel in ubuntu Jammy jellyfish?

  • Guardian release version: 1.22
  • Linux kernel version: 5.15.0-1016-gcp
  • Concourse version: latest dev
  • Go version: 1.19

xtremerui avatar Aug 26 '22 14:08 xtremerui

We have created an issue in Pivotal Tracker to manage this. Unfortunately, the Pivotal Tracker project is private so you may be unable to view the contents of the story.

The labels on this github issue will be updated when the story is started.

cf-gitbot avatar Aug 26 '22 14:08 cf-gitbot

This issue is being worked on under the Garden-runc-release/#233 issue

MarcPaquette avatar Sep 06 '22 19:09 MarcPaquette

It looks like this is the same issue that other contain runtimes have had with Jammy: https://github.com/containers/podman/issues/12559 .

Jammy uses cgroupv2 in the kernel, and it delegates cgroup authority to sub-processes (like the container runtime) as cgroupv2. runc supports cgroupv2 as of v1.0.0 release, but gdn is also directly altering cgroups using the old v1 schema: https://github.com/cloudfoundry/guardian/blob/8deac7e439aca41e515a74d7c8489081b8961b97/guardiancmd/command_linux.go#L307

This will require some substantial changes in how cgroups are managed in guardian in order to support new distributions that have switched to cgroupv2.

dtimm avatar Oct 19 '22 16:10 dtimm

Some updates:

Concourse with latest gdn can run successfully on an image with cgroups v1 enabled based on gcloud image family ubuntu-2204-lts .

xtremerui avatar Nov 03 '22 21:11 xtremerui

Hi @xtremerui , Is this issue still outstanding for you or did the newer image resolve it for you?

MarcPaquette avatar Jan 25 '23 14:01 MarcPaquette

@MarcPaquette the image with cgroups v1 enabled works for us. We still hoping gdn works for an image with cgroups v2 available only.

xtremerui avatar Jan 25 '23 22:01 xtremerui

@xtremerui Our team is starting to scope the work to use cgroups v2 only. We'll keep you updated as that work starts to get done.

dsabeti avatar Dec 12 '23 18:12 dsabeti

@dsabeti this is great news! Thank you and the team.

xtremerui avatar Dec 12 '23 21:12 xtremerui

Looking into this, we'd need to get a new stemcell built to allow the usage of cgroup v2. Currently the bosh stemcell builder is forcing us to use v1: https://github.com/cloudfoundry/bosh-linux-stemcell-builder/blob/57cd1eb14ddebd9666f15e83ecfa18f31350d45f/stemcell_builder/stages/image_install_grub/apply.sh#L89

I'm working on discussing this with Product Management.

MarcPaquette avatar Jan 08 '24 21:01 MarcPaquette

I'm going to close out this issue, as it's a known issue and we have future plans to resolve it. We're waiting on the Stemcell builds that enable this feature by default.

MarcPaquette avatar Apr 02 '24 15:04 MarcPaquette