ceph osd fails due to /dev/sdX being changed across reboots.

Open spatel-cog opened this issue 6 years ago • 0 comments

Is this a request for help?:No

Is this a BUG REPORT or FEATURE REQUEST? (choose one):BUG REPORT

Version of Helm and Kubernetes:Helm: 2.11.0, Kubernetes 1.11.6

Which chart: ceph-helm

What happened: Servers were configured with SAS controllers and onboard ATA controller i.e two sets of SSD/HDD controllers. Across reboots the drives /dev/ names changed e.g. drive on SAS controller port 1 became /dev/sdc and prior to reboot it was /dev/sda. This is not uncommon. The values.yaml file was configured to avoid the situation using by-path rather than /dev/sdX values.

osd_devices:

name: nvsedcog-osd-1 device: /dev/disk/by-path/pci-0000:00:11.4-ata-1 journal: /dev/disk/by-path/pci-0000:00:1f.2-ata-2
name: nvsedcog-osd-2 device: /dev/disk/by-path/pci-0000:00:11.4-ata-3 journal: /dev/disk/by-path/pci-0000:00:1f.2-ata-2

What you expected to happen: _osd_disk_activate.sh.tpl, _osd_disk_prepare.sh.tpl should have found the correct device name using readlink and used the corresponding /dev/sdX device.

How to reproduce it (as minimally and precisely as possible):

A SAS controller is not necessary - given 3 drives, /dev/sda, /dev/sdb, /dev/sdc, install ceph on /dev/sda and /dev/sdc. Shutdown the server and remove /dev/sdb. On restart, osd1 or the osd attached to /dev/sdc will fail.

Anything else we need to know: I'm attaching the "fixes" I made to support by-path names in the values.yaml file:

_osd_disk_prepare.sh.tpl.txt _osd_disk_activate.sh.tpl.txt

Mar 05 '19 01:03 spatel-cog