disk_setup (and probably fs_setup) does not work with nvme drives
Bug report
When trying to partition a nvme ssd, it seems the cloud init assumes bad name convention for partitions. It outputs
Failed during disk check for /dev/nvme0n11
Where the real name convention is /dev/nvme0n1p1
Steps to reproduce the problem
disk_setup:
"/dev/nvme0n1":
table_type: gpt
layout: [[5, 82], [95, 83]]
overwrite: true
sudo cloud-init single --name disk_setup --frequency always
Cloud-init v. 24.1.3-0ubuntu3 running 'single' at Wed, 01 May 2024 11:22:10 +0000. Up 1916.07 seconds.
2024-05-01 11:22:10,488 - util.py[WARNING]: Failed during filesystem operation
Failed during disk check for /dev/nvme0n11
Unexpected error while running command.
Command: ['/usr/bin/lsblk', '--pairs', '--output', 'NAME,TYPE,FSTYPE,LABEL', '/dev/nvme0n11', '--nodeps']
Exit code: 32
Reason: -
Stdout:
Stderr: lsblk: /dev/nvme0n11: not a block device
2024-05-01 11:22:10,511 - util.py[WARNING]: Failed during filesystem operation
Failed during disk check for /dev/nvme0n12
Unexpected error while running command.
Command: ['/usr/bin/lsblk', '--pairs', '--output', 'NAME,TYPE,FSTYPE,LABEL', '/dev/nvme0n12', '--nodeps']
Exit code: 32
Reason: -
Stdout:
Stderr: lsblk: /dev/nvme0n12: not a block device
/usr/bin/lsblk --pairs --output NAME,TYPE,FSTYPE,LABEL /dev/nvme0n1
NAME="nvme0n1" TYPE="disk" FSTYPE="" LABEL=""
NAME="nvme0n1p1" TYPE="part" FSTYPE="swap" LABEL=""
NAME="nvme0n1p2" TYPE="part" FSTYPE="" LABEL=""
Environment details
- Cloud-init version: /usr/bin/cloud-init 24.1.3-0ubuntu3
- Operating System Distribution: https://github.com/Joshua-Riek/ubuntu-rockchip
- Cloud provider, platform or installer type:
cloud-init logs
2024-05-01 11:29:57,240 - cc_disk_setup.py[DEBUG]: Creating new filesystem.
2024-05-01 11:29:57,240 - subp.py[DEBUG]: Running command ['udevadm', 'settle'] with allowed return codes [0] (shell=False, capture=True)
2024-05-01 11:29:57,249 - cc_disk_setup.py[DEBUG]: Checking /dev/nvme0n1 against default devices
2024-05-01 11:29:57,249 - cc_disk_setup.py[DEBUG]: Manual request of partition 2 for /dev/nvme0n12
2024-05-01 11:29:57,250 - cc_disk_setup.py[DEBUG]: Checking device /dev/nvme0n12
2024-05-01 11:29:57,250 - subp.py[DEBUG]: Running command ['/usr/sbin/blkid', '-c', '/dev/null', '/dev/nvme0n12'] with allowed return codes [0, 2] (shell=False, capture=True)
2024-05-01 11:29:57,251 - cc_disk_setup.py[DEBUG]: Device '/dev/nvme0n12' has check_label='None' check_fstype=None
2024-05-01 11:29:57,251 - cc_disk_setup.py[DEBUG]: Device /dev/nvme0n12 is cleared for formatting
2024-05-01 11:29:57,252 - cc_disk_setup.py[DEBUG]: File system type 'ext4' with label 'data' will be created on /dev/nvme0n12
2024-05-01 11:29:57,252 - subp.py[DEBUG]: Running command ['/usr/bin/lsblk', '--pairs', '--output', 'NAME,TYPE,FSTYPE,LABEL', '/dev/nvme0n12', '--nodeps'] with allowed return codes [0] (shell=False, capture=True)
2024-05-01 11:29:57,258 - util.py[DEBUG]: Creating fs for /dev/nvme0n1 took 0.018 seconds
2024-05-01 11:29:57,258 - util.py[WARNING]: Failed during filesystem operation
Failed during disk check for /dev/nvme0n12
Unexpected error while running command.
Command: ['/usr/bin/lsblk', '--pairs', '--output', 'NAME,TYPE,FSTYPE,LABEL', '/dev/nvme0n12', '--nodeps']
Exit code: 32
Reason: -
Stdout:
Stderr: lsblk: /dev/nvme0n12: not a block device
2024-05-01 11:29:57,258 - util.py[DEBUG]: Failed during filesystem operation
Failed during disk check for /dev/nvme0n12
Unexpected error while running command.
Command: ['/usr/bin/lsblk', '--pairs', '--output', 'NAME,TYPE,FSTYPE,LABEL', '/dev/nvme0n12', '--nodeps']
Exit code: 32
Reason: -
Stdout:
Stderr: lsblk: /dev/nvme0n12: not a block device
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/cloudinit/config/cc_disk_setup.py", line 272, in enumerate_disk
info, _err = subp.subp(lsblk_cmd)
^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/cloudinit/subp.py", line 298, in subp
raise ProcessExecutionError(
cloudinit.subp.ProcessExecutionError: Unexpected error while running command.
Command: ['/usr/bin/lsblk', '--pairs', '--output', 'NAME,TYPE,FSTYPE,LABEL', '/dev/nvme0n12', '--nodeps']
Exit code: 32
Reason: -
Stdout:
Stderr: lsblk: /dev/nvme0n12: not a block device
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/cloudinit/config/cc_disk_setup.py", line 157, in handle
util.log_time(
File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 2827, in log_time
ret = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/cloudinit/config/cc_disk_setup.py", line 1045, in mkfs
if overwrite or device_type(device) == "disk":
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/cloudinit/config/cc_disk_setup.py", line 299, in device_type
for d in enumerate_disk(device, nodeps=True):
File "/usr/lib/python3/dist-packages/cloudinit/config/cc_disk_setup.py", line 274, in enumerate_disk
raise RuntimeError(
RuntimeError: Failed during disk check for /dev/nvme0n12
Unexpected error while running command.
Command: ['/usr/bin/lsblk', '--pairs', '--output', 'NAME,TYPE,FSTYPE,LABEL', '/dev/nvme0n12', '--nodeps']
Exit code: 32
Reason: -
Stdout:
Stderr: lsblk: /dev/nvme0n12: not a block device
2024-05-01 11:29:57,259 - util.py[DEBUG]: Reading from /proc/uptime (quiet=False)
2024-05-01 11:29:57,259 - util.py[DEBUG]: Read 17 bytes from /proc/uptime
2024-05-01 11:29:57,259 - util.py[DEBUG]: cloud-init mode 'single' took 0.147 seconds (0.14)
NVME devices in Linux are not the only storage devices that have this differing form of naming for partitions, other examples would be SD cards (i.e. /dev/mmcblk0p1), partitioned loop devices (i.e. /dev/loop0p1) and I think software RAID devices (i.e. /dev/md0p1).
@nilo85 thanks for filing this issue! It looks like cloud-init needs to update this code to support various different disk types.
@nilo85 hey it looks like the config you shared can't invoke the codepath that caused the issue. Can you please provide the full config? I'm guessing that you tried to share just the relevant part and didn't realize that the error came from the fs format failure path.
@nilo85 hey it looks like the config you shared can't invoke the codepath that caused the issue. Can you please provide the full config? I'm guessing that you tried to share just the relevant part and didn't realize that the error came from the fs format failure path.
confirmed: https://github.com/canonical/cloud-init/pull/5263#issuecomment-2110815684
This was my config, then I later removed the alias to "ssd" to remove alias being the issue
device_aliases:
ssd: /dev/nvme0n1
disk_setup:
ssd:
table_type: gpt
layout: [[5, 82], [95, 83]]
fs_setup:
- label: swap
device: ssd.1
filesystem: swap
- label: data
device: ssd.2
filesystem: ext4
Pretty sure it is disk setup based on this line in the stacktrace?
File "/usr/lib/python3/dist-packages/cloudinit/config/cc_disk_setup.py", line 157, in handle
util.log_time(
I tried modifying the python code and added debug output etc but quickly realised why I dont code in python (couldnt make sens of it) ;)
But it seems those enumerate methods are shared cross fs and disk setup
I can try to give it a new try the next days to see if I can get better understanding exactly where we are in the flow.
EDIT: you are probably right, it did create the partition table for me etc, but not any filesystems, so maybe I was just assuming it was disk_setup that was the issue due to the filename.
I thought it was some disk verification step after creating layout that failed