linstor-server icon indicating copy to clipboard operation
linstor-server copied to clipboard

Can not migrate from overfilled node

Open kvaps opened this issue 5 years ago • 4 comments

Hi, today I faced the similar problem to https://github.com/LINBIT/linstor-server/issues/156:

linstor resource create m10c4 one-vm-10230-disk-0 -s DfltDisklessStorPool
(Node: 'm10c8') Failed to adjust DRBD resource one-vm-10230-disk-0
Error reports: [ 5FAEE457-585CD-024062 ]

5FAEE457-585CD-024062.log

# linstor r l -r one-vm-10230-disk-0
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName        ┊ Node   ┊ Port  ┊ Usage  ┊ Conns                   ┊    State ┊ CreatedOn           ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ one-vm-10230-disk-0 ┊ m10c4  ┊ 56330 ┊ Unused ┊ Connecting(m10c8)       ┊ Diskless ┊ 2020-11-16 14:38:21 ┊
┊ one-vm-10230-disk-0 ┊ m10c8  ┊ 56330 ┊ InUse  ┊ StandAlone(m8c25,m10c4) ┊ Diskless ┊                     ┊
┊ one-vm-10230-disk-0 ┊ m11c16 ┊ 56330 ┊ Unused ┊ Ok                      ┊ UpToDate ┊                     ┊
┊ one-vm-10230-disk-0 ┊ m8c25  ┊ 56330 ┊ Unused ┊ Connecting(m10c8)       ┊ Diskless ┊ 2020-11-16 14:36:48 ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────╯

kvaps avatar Nov 16 '20 14:11 kvaps

Not sure how that happened... Can you show me the corresponding DRBD .res file on m10c4?

ghernadi avatar Nov 16 '20 15:11 ghernadi

Hi @ghernadi, Sorry already can't, m10c4 was rebooted and configuration regenerated. However I can provide you kernel log if you want to.

Currently I run the loop to upgrade OS image on all our nodes. This node had quite old drbd module version 9.0.19

kvaps avatar Nov 16 '20 15:11 kvaps

Sorry, my bad. I misread the ErrorReport. I was wondering why Linstor tries to create metadata for DRBD when you created the resource in a diskless storage pool. Now that I re-red the ErrorReport I see that Linstor does not try to do that.

Here is what happens: You create a new resource in the DfltDisklessStorPool, which is detected by Linstor so the resource automatically gets the DISKLESS flag set. That means, the new resource ending up as Diskless is expected and fine. However, for now I have to assume that the diskful resource on m10c8 (which created the ErrorReport you showed) was already broken before the creation of this new diskless resource (due to overfull thin-lvm). However, as you create the new diskless resource, all other DRBD-peers also have to be updated. In the end, the attempt to update the resource on m10c8 triggered the "No space left on device"-ErrorReport you attached here.

Which means, unless I miss something, I don't really see what Linstor could do in this situation...

ghernadi avatar Nov 17 '20 06:11 ghernadi

However, as you create the new diskless resource, all other DRBD-peers also have to be updated

@ghernadi thank you for detailed explanation, now it is very clear.

, I don't really see what Linstor could do in this situation...

I think in this case Linstor should not rely on broken replicas in this case, I guess the same problem persists with the deleting resources which have replica on failed node https://github.com/LINBIT/linstor-server/issues/112

kvaps avatar Nov 17 '20 16:11 kvaps