core-base icon indicating copy to clipboard operation
core-base copied to clipboard

cloud-init service failing in uc24

Open sergiocazzolato opened this issue 1 year ago • 14 comments

I see the cloud-init service is failing in our uc24 tests in google. These are some error in the log:

Mar 19 12:38:17 localhost cloud-init[927]: Cannot call Open vSwitch: ovsdb-server.service is not running. Mar 19 12:38:17 localhost cloud-init[927]: Failed to connect to system bus: Connection refused Mar 19 12:38:17 localhost cloud-init[927]: Falling back to a hard restart of systemd-networkd.service Mar 19 12:38:17 mar191214-516237 cloud-init[927]: 2024-03-19 12:38:17,575 - log.py[DEPRECATED]: Growpart's 'mode' key with value '{mode}' is deprecated in 22.2 and scheduled to be re> Mar 19 12:38:17 mar191214-516237 groupadd[1046]: group added to /var/lib/extrausers/group: name=lxd, GID=1000 Mar 19 12:38:17 mar191214-516237 groupadd[1046]: group added to /var/lib/extrausers/gshadow: name=lxd Mar 19 12:38:17 mar191214-516237 groupadd[1046]: new group: name=lxd, GID=1000 Mar 19 12:38:17 mar191214-516237 useradd[1052]: failed adding user '', exit code: 10 Mar 19 12:38:17 mar191214-516237 cloud-init[927]: 2024-03-19 12:38:17,698 - util.py[WARNING]: Failed to create user ubuntu Mar 19 12:38:17 mar191214-516237 cloud-init[927]: 2024-03-19 12:38:17,819 - util.py[WARNING]: Running module users_groups (<module 'cloudinit.config.cc_users_groups' from '/usr/lib/p> Mar 19 12:38:18 mar191214-516237 cloud-init[927]: 2024-03-19 12:38:18,271 - util.py[WARNING]: Applying SSH credentials failed!

sergiocazzolato avatar Mar 19 '24 13:03 sergiocazzolato

Mar 19 12:38:17 mar191214-516237 useradd[1052]: failed adding user '', exit code: 10 is it invoked wrong?

@alfonsosanchezbeato is there any issue in the first few lines related to Open vSwitch?

Meulengracht avatar Mar 19 '24 14:03 Meulengracht

Probably red herring. Here the issue is:

Mar 19 12:38:17 mar191214-516237 useradd[1052]: failed adding user '', exit code: 10
Mar 19 12:38:17 mar191214-516237 cloud-init[927]: 2024-03-19 12:38:17,698 - util.py[WARNING]: Failed to create user ubuntu

valentindavid avatar Mar 19 '24 15:03 valentindavid

@sergiocazzolato Which test exactly has this failure? I have tested manually adding users through cloud-init. But that did not fail for me.

valentindavid avatar Mar 21 '24 10:03 valentindavid

@valentindavid I see it with the test google:ubuntu-core-24-64:tests/main/degraded

It is failing because the cloud-init service fails.

sergiocazzolato avatar Mar 21 '24 11:03 sergiocazzolato

Looked a bit more into it. I am not sure why on google cloud, it needs to do that, but cloud-init calls netplan apply. And netplan apply is broken.

valentindavid avatar Mar 21 '24 12:03 valentindavid

Normally cloud-init is not started. Unless it found configuration for it. In this case, cloud-init was started because it detected it was running in google cloud.

However, an google cloud configuration does not work, as it would create the default user is not valid in the default cloud-init configuration.

So for now we should fix the test with https://github.com/snapcore/snapd/pull/13742

Later, we should drop the default configuration from Ubuntu as it does not make sense for Ubuntu Core, and provide our own.

valentindavid avatar Mar 21 '24 15:03 valentindavid

I think I'm seeing something similar when trying to install uc24 using maas. It seems to fail when creating the user. I tried executing the failed useradd command later after manually finishing the installation, and got this error:

$ sudo useradd ubuntu --extrausers --comment Ubuntu --groups adm,cdrom,dip,lxd,sudo --shell /bin/bash -m
useradd: cannot lock /etc/group; try again later.

plars avatar Apr 19 '24 19:04 plars

Opened this PR to resolve this https://github.com/snapcore/core-base/pull/216

Meulengracht avatar Apr 22 '24 13:04 Meulengracht

Opened this PR to resolve this https://github.com/snapcore/core-base/pull/216

@Meulengracht Are we sure that the incorrect datasource is being detected? @sergiocazzolato Can you please include the rest of the log? That will help us to be sure.

holmanb avatar Apr 22 '24 19:04 holmanb

@Meulengracht The fact that this command is not supported on ubuntu core is surprising to say the least - every linux distro that cloud-init supports has some variation of this command.

Even if that command failing is a red herring for why this test failed, adding users is a core feature of cloud-init, and therefore a broken cloud-init feature on ubuntu core.

If we want cloud-init to work on ubuntu core, we'll need to better understand of how cloud-init is supposed to do things like add users, configure the network, generate ssh keys, etc. The log above included a lot of failures that shouldn't have happened even if cloud-init had detected the correct cloud (and do not happen on ubuntu server cloud images). I suspect that these are symptoms of many of cloud-init's features not actually working on ubuntu core in the first place.

holmanb avatar Apr 22 '24 19:04 holmanb

@Meulengracht The fact that this command is not supported on ubuntu core is surprising to say the least - every linux distro that cloud-init supports has some variation of this command.

Sorry, I have not been entirely clear around this. useradd is supported, but only with --extrausers switch for adding users. The issue is that useradd is invoked with the --groups switch which attempts to add the user to groups, and this cannot be done in this specific way on ubuntu core as /etc/group is read-only.

Even if that command failing is a red herring for why this test failed, adding users is a core feature of cloud-init, and therefore a broken cloud-init feature on ubuntu core.

The issue is mostly the way cloud-init adds users by default is not supported.

If we want cloud-init to work on ubuntu core, we'll need to better understand of how cloud-init is supposed to do things like add users, configure the network, generate ssh keys, etc. The log above included a lot of failures that shouldn't have happened even if cloud-init had detected the correct cloud (and do not happen on ubuntu server cloud images). I suspect that these are symptoms of many of cloud-init's features not actually working on ubuntu core in the first place.

I agree on this, and I think we should work towards better understanding between UC and cloud-init, especially with the goal we have of migrating cloud-init into its own snap, and this will most likely require some collaboration between us as well.

Meulengracht avatar Apr 22 '24 20:04 Meulengracht

@Meulengracht The fact that this command is not supported on ubuntu core is surprising to say the least - every linux distro that cloud-init supports has some variation of this command.

Sorry, I have not been entirely clear around this. useradd is supported, but only with --extrausers switch for adding users. The issue is that useradd is invoked with the --groups switch which attempts to add the user to groups, and this cannot be done in this specific way on ubuntu core as /etc/group is read-only.

Thanks for the clarification. If the groups file is read-only, how is useradd (or any other tool) supposed to add users to groups on ubuntu core?

Even if that command failing is a red herring for why this test failed, adding users is a core feature of cloud-init, and therefore a broken cloud-init feature on ubuntu core.

The issue is mostly the way cloud-init adds users by default is not supported.

Cloud-init adding the default user is what you've bumped into in your tests, right? What I'm trying to point out is that this default user isn't the only use-case for cloud-init modifying groups. User-provided configurations can instruct cloud-init to modify the instance to add arbitrary users to the system (with groups). Since this is broken for the default user, I'm pointing out that this whole module is broken by this, not just cloud-init's default user creation.

especially with the goal we have of migrating cloud-init into its own snap, and this will most likely require some collaboration between us as well.

I agree with this in the sense that moving cloud-init out of ubuntu core might make sense. However, it's hard to imagine that in the near term a cloud-init snap would replace the deb package that is shipped on all cloud images today, for many logistical and technical reasons.

holmanb avatar Apr 23 '24 00:04 holmanb

Thanks for the clarification. If the groups file is read-only, how is useradd (or any other tool) supposed to add users to groups on ubuntu core?

Okay, I ran some more tests now that dhcpcd-base is in the core image and compared to how Cloud-init works on UC22.

The logs now look a lot more similar, however the useradd issue is still present. I see now that the useradd command issued by Cloud-init is the same on UC22 which means that it's possible now that it's actually useradd that may be broken.

Since --extrausers is behavior we have are adding there is a good chance it might have changed behavior and now be broken on UC. At a first glance it looks like its now licking /etc/groups. I'm currently pursuing this.

It may have looked like I didn't fully understand the extend of this issue

Meulengracht avatar Apr 23 '24 05:04 Meulengracht

@holmanb I think we got to the bottom of it, there are two issues:

  1. Cloud-init's default groups are not supported on UC as that requires locking of /etc/groups to modify. See this PR for our work-around.

  2. useradd seems to have a regression in how it handles this, in UC22 it ignores the groups, in UC24 it seems to fail on this. We've reported a LP bug around this https://bugs.launchpad.net/ubuntu/+source/shadow/+bug/2063200

Meulengracht avatar Apr 23 '24 17:04 Meulengracht

The new version of shadow is available in noble-proposed. I extracted the binary files (including useradd) of login and passwd packages into core24 (version 20240710). My image with cloud-init can boot correctly and cloud-init created the user account successfully, I'm not sure if there is an official and efficient way to verify a proposed image, but am happy to leave a comment in the bug ticket.

tsunghanliu avatar Aug 28 '24 08:08 tsunghanliu