ceph osd fails due to /dev/sdX being changed across reboots.
Is this a request for help?:No
Is this a BUG REPORT or FEATURE REQUEST? (choose one):BUG REPORT
Version of Helm and Kubernetes:Helm: 2.11.0, Kubernetes 1.11.6
Which chart: ceph-helm
What happened: Servers were configured with SAS controllers and onboard ATA controller i.e two sets of SSD/HDD controllers. Across reboots the drives /dev/ names changed e.g. drive on SAS controller port 1 became /dev/sdc and prior to reboot it was /dev/sda. This is not uncommon. The values.yaml file was configured to avoid the situation using by-path rather than /dev/sdX values.
osd_devices:
- name: nvsedcog-osd-1 device: /dev/disk/by-path/pci-0000:00:11.4-ata-1 journal: /dev/disk/by-path/pci-0000:00:1f.2-ata-2
- name: nvsedcog-osd-2 device: /dev/disk/by-path/pci-0000:00:11.4-ata-3 journal: /dev/disk/by-path/pci-0000:00:1f.2-ata-2
What you expected to happen: _osd_disk_activate.sh.tpl, _osd_disk_prepare.sh.tpl should have found the correct device name using readlink and used the corresponding /dev/sdX device.
How to reproduce it (as minimally and precisely as possible):
A SAS controller is not necessary - given 3 drives, /dev/sda, /dev/sdb, /dev/sdc, install ceph on /dev/sda and /dev/sdc. Shutdown the server and remove /dev/sdb. On restart, osd1 or the osd attached to /dev/sdc will fail.
Anything else we need to know: I'm attaching the "fixes" I made to support by-path names in the values.yaml file: