Resource stuck on DELETING after timeout operation
One time I had an error during removing resource from one node:
Description:
Failed to delete volume [hosting-vol-data-web-hc1-wd11-0_00000]
Cause:
External command timed out
Additional information:
External command: lvremove -f data/hosting-vol-data-web-hc1-wd11-0_00000
After few time I've checked lvm on the node, it was fully removed, but linstor still show it as DELETING.
After restart linstor-satellite service it was totally disappears.
"External command timed out" means that the lvremove utility was stuck for longer than what LINSTOR's timeout for running external commands allows (the default is 45 seconds). The reason for that is normally outside of LINSTOR, in this case probably some problem with LVM.
LVM can become extremely slow if the filters are not configured correctly, because it attempts to scan various kinds of block devices for physical volumes, etc. - and sometimes that includes existing DRBD devices.
@raltnoeder I know that. I'm just saying, that after few time it was removed but satellite does not monitored that situation until restart.
Just faced with this problem again, resource may stuck on DELETING if run this during RESIZING. Or opposite side, I'm not understood it well.
UPD1: Resource continue stuck even after reboot of nodes, and restart linstor-satellites and linstor-controller
$ linstor r l -r one-vm-120-disk-0
+---------------------------------------------------------------+
| ResourceName | Node | Port | Usage | State |
|---------------------------------------------------------------|
| one-vm-120-disk-0 | m1c4 | 7051 | | DELETING |
| one-vm-120-disk-0 | m1c6 | 7051 | Unused | Resizing, UpToDate |
| one-vm-120-disk-0 | m1c9 | 7051 | Unused | Resizing, UpToDate |
+---------------------------------------------------------------+
$ linstor r lv -r one-vm-120-disk-0
+--------------------------------------------------------------------------------------------------------------------------------+
| Node | Resource | StoragePool | VolumeNr | MinorNr | DeviceName | Allocated | InUse | State |
|--------------------------------------------------------------------------------------------------------------------------------|
| m1c4 | one-vm-120-disk-0 | DfltDisklessStorPool | 0 | 1051 | | 20 GiB | | Resizing, Unknown |
| m1c6 | one-vm-120-disk-0 | Data | 0 | 1051 | | 20 GiB | Unused | Resizing, UpToDate |
| m1c9 | one-vm-120-disk-0 | Data | 0 | 1051 | /dev/drbd1051 | 20 GiB | Unused | Resizing, UpToDate |
+--------------------------------------------------------------------------------------------------------------------------------+
UPD1: Resource continue stuck on resizing even after linstor node lost m1c4:
$ linstor r l -r one-vm-120-disk-0
+---------------------------------------------------------------+
| ResourceName | Node | Port | Usage | State |
|---------------------------------------------------------------|
| one-vm-120-disk-0 | m1c6 | 7051 | Unused | Resizing, UpToDate |
| one-vm-120-disk-0 | m1c9 | 7051 | Unused | Resizing, UpToDate |
+---------------------------------------------------------------+
$ linstor r lv -r one-vm-120-disk-0
+-----------------------------------------------------------------------------------------------------------------------+
| Node | Resource | StoragePool | VolumeNr | MinorNr | DeviceName | Allocated | InUse | State |
|-----------------------------------------------------------------------------------------------------------------------|
| m1c6 | one-vm-120-disk-0 | Data | 0 | 1051 | | 20 GiB | Unused | Resizing, UpToDate |
| m1c9 | one-vm-120-disk-0 | Data | 0 | 1051 | /dev/drbd1051 | 20 GiB | Unused | Resizing, UpToDate |
+-----------------------------------------------------------------------------------------------------------------------+