linstor-server icon indicating copy to clipboard operation
linstor-server copied to clipboard

Resource stuck on DELETING after timeout operation

Open kvaps opened this issue 7 years ago • 3 comments

One time I had an error during removing resource from one node:

Description:
    Failed to delete volume [hosting-vol-data-web-hc1-wd11-0_00000]
Cause:
    External command timed out
Additional information:
    External command: lvremove -f data/hosting-vol-data-web-hc1-wd11-0_00000

After few time I've checked lvm on the node, it was fully removed, but linstor still show it as DELETING.

After restart linstor-satellite service it was totally disappears.

kvaps avatar Dec 20 '18 12:12 kvaps

"External command timed out" means that the lvremove utility was stuck for longer than what LINSTOR's timeout for running external commands allows (the default is 45 seconds). The reason for that is normally outside of LINSTOR, in this case probably some problem with LVM. LVM can become extremely slow if the filters are not configured correctly, because it attempts to scan various kinds of block devices for physical volumes, etc. - and sometimes that includes existing DRBD devices.

raltnoeder avatar Dec 20 '18 12:12 raltnoeder

@raltnoeder I know that. I'm just saying, that after few time it was removed but satellite does not monitored that situation until restart.

kvaps avatar Dec 20 '18 12:12 kvaps

Just faced with this problem again, resource may stuck on DELETING if run this during RESIZING. Or opposite side, I'm not understood it well.

UPD1: Resource continue stuck even after reboot of nodes, and restart linstor-satellites and linstor-controller

$ linstor r l -r one-vm-120-disk-0
+---------------------------------------------------------------+
| ResourceName      | Node | Port | Usage  |              State |
|---------------------------------------------------------------|
| one-vm-120-disk-0 | m1c4 | 7051 |        |           DELETING |
| one-vm-120-disk-0 | m1c6 | 7051 | Unused | Resizing, UpToDate |
| one-vm-120-disk-0 | m1c9 | 7051 | Unused | Resizing, UpToDate |
+---------------------------------------------------------------+
$ linstor r lv -r one-vm-120-disk-0
+--------------------------------------------------------------------------------------------------------------------------------+
| Node | Resource          | StoragePool          | VolumeNr | MinorNr | DeviceName    | Allocated | InUse  |              State |
|--------------------------------------------------------------------------------------------------------------------------------|
| m1c4 | one-vm-120-disk-0 | DfltDisklessStorPool | 0        | 1051    |               | 20 GiB    |        |  Resizing, Unknown |
| m1c6 | one-vm-120-disk-0 | Data                 | 0        | 1051    |               | 20 GiB    | Unused | Resizing, UpToDate |
| m1c9 | one-vm-120-disk-0 | Data                 | 0        | 1051    | /dev/drbd1051 | 20 GiB    | Unused | Resizing, UpToDate |
+--------------------------------------------------------------------------------------------------------------------------------+

UPD1: Resource continue stuck on resizing even after linstor node lost m1c4:

$ linstor r l -r one-vm-120-disk-0
+---------------------------------------------------------------+
| ResourceName      | Node | Port | Usage  |              State |
|---------------------------------------------------------------|
| one-vm-120-disk-0 | m1c6 | 7051 | Unused | Resizing, UpToDate |
| one-vm-120-disk-0 | m1c9 | 7051 | Unused | Resizing, UpToDate |
+---------------------------------------------------------------+
$ linstor r lv -r one-vm-120-disk-0
+-----------------------------------------------------------------------------------------------------------------------+
| Node | Resource          | StoragePool | VolumeNr | MinorNr | DeviceName    | Allocated | InUse  |              State |
|-----------------------------------------------------------------------------------------------------------------------|
| m1c6 | one-vm-120-disk-0 | Data        | 0        | 1051    |               | 20 GiB    | Unused | Resizing, UpToDate |
| m1c9 | one-vm-120-disk-0 | Data        | 0        | 1051    | /dev/drbd1051 | 20 GiB    | Unused | Resizing, UpToDate |
+-----------------------------------------------------------------------------------------------------------------------+

kvaps avatar May 02 '19 12:05 kvaps