troubleshoot icon indicating copy to clipboard operation
troubleshoot copied to clipboard

Block Device Analyzer excludes RAID block devices

Open diamonwiggins opened this issue 3 years ago • 2 comments

Bug Description

The blockDevices analyzer will only pass if the device type is disk or part. This excludes devices of type raid* as well as others that a user may intend to use.

Per our documentation, block devices with any of the following characteristics are not counted:

  • Devices with a filesystem
  • Partitioned devices
  • Read-only devices
  • Loopback devices
  • Removable devices
# This device does not meet criteria solely because it is type raid1
  {
    "name": "md5",
    "kernel_name": "md5",
    "parent_kernel_name": "nvme1n1p5",
    "type": "raid1",
    "major": 9,
    "minor": 5,
    "size": 404967391232,
    "filesystem_type": "",
    "mountpoint": "",
    "serial": "",
    "read_only": false,
    "removable": false
  }

At the very least our documentation isn't clear which device types are allowed and which aren't.

The use case for this analyzer has traditionally been to ensure that there is a raw unformatted block device available before a persistent storage provider requiring one is installed. For example providers like Rook+Ceph and OpenEBS consume a raw block device for their storage implementations. With these storage providers, using a RAID devices is usually unsupported or not recommended due to replication and other storage features being handled by the provider itself.

However, this precise use case is not clear to would be users of this analyzer. We should either update our documentation and perhaps even the name of the analyzer to make this clear, or perhaps leave flexibility in the configuration of the blockDevices analyzer for device types. eg:

apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
  name: block-devices
spec:
  hostCollectors:
    - blockDevices: {}
  hostAnalyzers:
    - blockDevices:
        includeUnmountedPartitions: true
        minimumAcceptableSize: 10737418240 # 1024 ^ 3 * 10, 10GiB
        acceptedDeviceTypes: # this property does not exist today, but something like it could be added.
            - disk
            - part
            - raid
        outcomes:
        - pass:
            when: ".* == 1"
            message: One available block device
        - pass:
            when: ".* > 1"
            message: Multiple available block devices
        - fail:
            message: No available block devices

Expected Behavior

Docs should be updated to reflect more clearly the use case of the blockDevices analyzer, or it should allow you to have more flexibility over device type.

Steps To Reproduce Use the blockDevices analyzer with a raw unformatted block device of type raid*

apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
  name: block-devices
spec:
  hostCollectors:
    - blockDevices: {}
  hostAnalyzers:
    - blockDevices:
        includeUnmountedPartitions: true
        minimumAcceptableSize: 10737418240 # 1024 ^ 3 * 10, 10GiB
        outcomes:
        - pass:
            when: "raid* == 1"
            message: One available block device
        - fail:
            message: No available block devices

Additional Context Impacts all versions of Troubleshoot since the blockDevices analyzer was added

diamonwiggins avatar Jan 13 '23 17:01 diamonwiggins

I've found another use case, I think I have scsi devices that b/c they are hot swappable get marked as removable and not counted.

Example:

root@mirko ~ # lshw -class disk
  *-disk:0
       description: ATA Disk
       product: ST4000NM0024-1HT
       physical id: 0
       bus info: scsi@1:0.0.0
       logical name: /dev/sda
       version: SN06
       serial: Z4F0NYTH
       size: 3726GiB (4TB)
       capabilities: removable
       configuration: ansiversion=5 logicalsectorsize=512 sectorsize=4096
     *-medium
          physical id: 0
          logical name: /dev/sda
          size: 3726GiB (4TB)
  *-disk:1
       description: ATA Disk
       product: HGST HMS5C4040BL
       physical id: 1
       bus info: scsi@2:0.0.0
       logical name: /dev/sdb
       version: A5D0
       serial: PL1331LAGRNU6H
       size: 3726GiB (4TB)
       capabilities: removable
       configuration: ansiversion=5 logicalsectorsize=512 sectorsize=4096
     *-medium
          physical id: 0
          logical name: /dev/sdb
          size: 3726GiB (4TB)
  *-namespace:0
       description: NVMe disk
       physical id: 0
       logical name: hwmon0
  *-namespace:1
       description: NVMe disk
       physical id: 2
       logical name: /dev/ng0n1
  *-namespace:2
       description: NVMe disk
       physical id: 1
       bus info: nvme@0:1
       logical name: /dev/nvme0n1
       size: 476GiB (512GB)
       capabilities: partitioned partitioned:dos
       configuration: logicalsectorsize=512 sectorsize=512 signature=199be9e3 wwid=eui.002538c5710042de

chris-sanders avatar Feb 14 '23 23:02 chris-sanders

The plot thickens, this might be harder than I thought, lsblk has added code to distinguish between Hotplug and Removable this might provide some insight into options: https://github.com/util-linux/util-linux/pull/2011/files

chris-sanders avatar Feb 15 '23 01:02 chris-sanders