Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disk_setup (and probably fs_setup) does not work with nvme drives #5246

Closed
nilo85 opened this issue May 1, 2024 · 5 comments · Fixed by #5263
Closed

disk_setup (and probably fs_setup) does not work with nvme drives #5246

nilo85 opened this issue May 1, 2024 · 5 comments · Fixed by #5263
Labels
bug Something isn't working correctly incomplete Action required by submitter

Comments

@nilo85
Copy link

nilo85 commented May 1, 2024

Bug report

When trying to partition a nvme ssd, it seems the cloud init assumes bad name convention for partitions.
It outputs

Failed during disk check for /dev/nvme0n11

Where the real name convention is /dev/nvme0n1p1

Steps to reproduce the problem

disk_setup:
  "/dev/nvme0n1":
    table_type: gpt
    layout: [[5, 82], [95, 83]]
    overwrite: true
sudo cloud-init single --name disk_setup --frequency always
Cloud-init v. 24.1.3-0ubuntu3 running 'single' at Wed, 01 May 2024 11:22:10 +0000. Up 1916.07 seconds.
2024-05-01 11:22:10,488 - util.py[WARNING]: Failed during filesystem operation
Failed during disk check for /dev/nvme0n11
Unexpected error while running command.
Command: ['/usr/bin/lsblk', '--pairs', '--output', 'NAME,TYPE,FSTYPE,LABEL', '/dev/nvme0n11', '--nodeps']
Exit code: 32
Reason: -
Stdout:
Stderr: lsblk: /dev/nvme0n11: not a block device
2024-05-01 11:22:10,511 - util.py[WARNING]: Failed during filesystem operation
Failed during disk check for /dev/nvme0n12
Unexpected error while running command.
Command: ['/usr/bin/lsblk', '--pairs', '--output', 'NAME,TYPE,FSTYPE,LABEL', '/dev/nvme0n12', '--nodeps']
Exit code: 32
Reason: -
Stdout:
Stderr: lsblk: /dev/nvme0n12: not a block device


/usr/bin/lsblk --pairs --output NAME,TYPE,FSTYPE,LABEL /dev/nvme0n1
NAME="nvme0n1" TYPE="disk" FSTYPE="" LABEL=""
NAME="nvme0n1p1" TYPE="part" FSTYPE="swap" LABEL=""
NAME="nvme0n1p2" TYPE="part" FSTYPE="" LABEL=""

Environment details

cloud-init logs

2024-05-01 11:29:57,240 - cc_disk_setup.py[DEBUG]: Creating new filesystem.
2024-05-01 11:29:57,240 - subp.py[DEBUG]: Running command ['udevadm', 'settle'] with allowed return codes [0] (shell=False, capture=True)
2024-05-01 11:29:57,249 - cc_disk_setup.py[DEBUG]: Checking /dev/nvme0n1 against default devices
2024-05-01 11:29:57,249 - cc_disk_setup.py[DEBUG]: Manual request of partition 2 for /dev/nvme0n12
2024-05-01 11:29:57,250 - cc_disk_setup.py[DEBUG]: Checking device /dev/nvme0n12
2024-05-01 11:29:57,250 - subp.py[DEBUG]: Running command ['/usr/sbin/blkid', '-c', '/dev/null', '/dev/nvme0n12'] with allowed return codes [0, 2] (shell=False, capture=True)
2024-05-01 11:29:57,251 - cc_disk_setup.py[DEBUG]: Device '/dev/nvme0n12' has check_label='None' check_fstype=None
2024-05-01 11:29:57,251 - cc_disk_setup.py[DEBUG]: Device /dev/nvme0n12 is cleared for formatting
2024-05-01 11:29:57,252 - cc_disk_setup.py[DEBUG]: File system type 'ext4' with label 'data' will be created on /dev/nvme0n12
2024-05-01 11:29:57,252 - subp.py[DEBUG]: Running command ['/usr/bin/lsblk', '--pairs', '--output', 'NAME,TYPE,FSTYPE,LABEL', '/dev/nvme0n12', '--nodeps'] with allowed return codes [0] (shell=False, capture=True)
2024-05-01 11:29:57,258 - util.py[DEBUG]: Creating fs for /dev/nvme0n1 took 0.018 seconds
2024-05-01 11:29:57,258 - util.py[WARNING]: Failed during filesystem operation
Failed during disk check for /dev/nvme0n12
Unexpected error while running command.
Command: ['/usr/bin/lsblk', '--pairs', '--output', 'NAME,TYPE,FSTYPE,LABEL', '/dev/nvme0n12', '--nodeps']
Exit code: 32
Reason: -
Stdout:
Stderr: lsblk: /dev/nvme0n12: not a block device
2024-05-01 11:29:57,258 - util.py[DEBUG]: Failed during filesystem operation
Failed during disk check for /dev/nvme0n12
Unexpected error while running command.
Command: ['/usr/bin/lsblk', '--pairs', '--output', 'NAME,TYPE,FSTYPE,LABEL', '/dev/nvme0n12', '--nodeps']
Exit code: 32
Reason: -
Stdout:
Stderr: lsblk: /dev/nvme0n12: not a block device
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/cloudinit/config/cc_disk_setup.py", line 272, in enumerate_disk
    info, _err = subp.subp(lsblk_cmd)
                 ^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/cloudinit/subp.py", line 298, in subp
    raise ProcessExecutionError(
cloudinit.subp.ProcessExecutionError: Unexpected error while running command.
Command: ['/usr/bin/lsblk', '--pairs', '--output', 'NAME,TYPE,FSTYPE,LABEL', '/dev/nvme0n12', '--nodeps']
Exit code: 32
Reason: -
Stdout:
Stderr: lsblk: /dev/nvme0n12: not a block device

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/cloudinit/config/cc_disk_setup.py", line 157, in handle
    util.log_time(
  File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 2827, in log_time
    ret = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/cloudinit/config/cc_disk_setup.py", line 1045, in mkfs
    if overwrite or device_type(device) == "disk":
                    ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/cloudinit/config/cc_disk_setup.py", line 299, in device_type
    for d in enumerate_disk(device, nodeps=True):
  File "/usr/lib/python3/dist-packages/cloudinit/config/cc_disk_setup.py", line 274, in enumerate_disk
    raise RuntimeError(
RuntimeError: Failed during disk check for /dev/nvme0n12
Unexpected error while running command.
Command: ['/usr/bin/lsblk', '--pairs', '--output', 'NAME,TYPE,FSTYPE,LABEL', '/dev/nvme0n12', '--nodeps']
Exit code: 32
Reason: -
Stdout:
Stderr: lsblk: /dev/nvme0n12: not a block device
2024-05-01 11:29:57,259 - util.py[DEBUG]: Reading from /proc/uptime (quiet=False)
2024-05-01 11:29:57,259 - util.py[DEBUG]: Read 17 bytes from /proc/uptime
2024-05-01 11:29:57,259 - util.py[DEBUG]: cloud-init mode 'single' took 0.147 seconds (0.14)
@nilo85 nilo85 added bug Something isn't working correctly new An issue that still needs triage labels May 1, 2024
@dermotbradley
Copy link
Contributor

NVME devices in Linux are not the only storage devices that have this differing form of naming for partitions, other examples would be SD cards (i.e. /dev/mmcblk0p1), partitioned loop devices (i.e. /dev/loop0p1) and I think software RAID devices (i.e. /dev/md0p1).

@holmanb holmanb removed the new An issue that still needs triage label May 3, 2024
@holmanb
Copy link
Member

holmanb commented May 3, 2024

@nilo85 thanks for filing this issue! It looks like cloud-init needs to update this code to support various different disk types.

@holmanb
Copy link
Member

holmanb commented May 14, 2024

@nilo85 hey it looks like the config you shared can't invoke the codepath that caused the issue. Can you please provide the full config? I'm guessing that you tried to share just the relevant part and didn't realize that the error came from the fs format failure path.

@holmanb holmanb added the incomplete Action required by submitter label May 14, 2024
@holmanb
Copy link
Member

holmanb commented May 14, 2024

@nilo85 hey it looks like the config you shared can't invoke the codepath that caused the issue. Can you please provide the full config? I'm guessing that you tried to share just the relevant part and didn't realize that the error came from the fs format failure path.

confirmed: #5263 (comment)

@nilo85
Copy link
Author

nilo85 commented May 15, 2024

This was my config, then I later removed the alias to "ssd" to remove alias being the issue

device_aliases:
  ssd: /dev/nvme0n1

disk_setup:
  ssd:
    table_type: gpt
    layout: [[5, 82], [95, 83]]

fs_setup:
- label: swap
  device: ssd.1
  filesystem: swap
- label: data
  device: ssd.2
  filesystem: ext4

Pretty sure it is disk setup based on this line in the stacktrace?

File "/usr/lib/python3/dist-packages/cloudinit/config/cc_disk_setup.py", line 157, in handle
    util.log_time(

I tried modifying the python code and added debug output etc but quickly realised why I dont code in python (couldnt make sens of it) ;)

But it seems those enumerate methods are shared cross fs and disk setup

I can try to give it a new try the next days to see if I can get better understanding exactly where we are in the flow.

EDIT: you are probably right, it did create the partition table for me etc, but not any filesystems, so maybe I was just assuming it was disk_setup that was the issue due to the filename.

I thought it was some disk verification step after creating layout that failed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working correctly incomplete Action required by submitter
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants