btrfs-progs icon indicating copy to clipboard operation
btrfs-progs copied to clipboard

[BUG?] free_extent_update_loop: errno=-22 unknown

Open YourSandwich opened this issue 2 years ago • 7 comments

Hello, I was converting my HDD ARRAY from Linear to RAID6(Data) RAID1C3(Metadata) about 40% into the balance process and my FS got into forced Read-only mode, and the balance don't want to finish and i am unable to cancel the balance, need to mount the FS with skip_balance.

dmesg, gives me following errors:

[114905.372197] BTRFS: error (device sda: state A) in find_free_extent_update_loop:4129: errno=-22 unknown
[114905.372205] BTRFS info (device sda: state EA): forced readonly
[114905.372211] BTRFS: error (device sda: state EA) in reset_balance_state:3595: errno=-22 unknown
[114905.372221] BTRFS info (device sda: state EA): balance: canceled

My Array contains 6 x 8TB Disks, and btrfs check is unable to find any issues ´, this is the output of btrfs filesystem usage:

Overall:
    Device size:                  44.60TiB
    Device allocated:             37.99TiB
    Device unallocated:            6.61TiB
    Device missing:                  0.00B
    Device slack:                    0.00B
    Used:                         34.06TiB
    Free (estimated):              6.95TiB      (min: 4.79TiB)
    Free (statfs, df):             6.11TiB
    Data ratio:                       1.51
    Metadata ratio:                   2.52
    Global reserve:              512.00MiB      (used: 0.00B)
    Multiple profiles:                 yes      (data, metadata, system)

Data,single: Size:10.23TiB, Used:9.92TiB (96.93%)
   /dev/sda        4.10TiB
   /dev/sdb        6.14TiB

Data,RAID6: Size:14.80TiB, Used:12.88TiB (86.99%)
   /dev/sda        1.79TiB
   /dev/sdb       77.04GiB
   /dev/sde        6.43TiB
   /dev/sdd        6.44TiB
   /dev/sdc        6.44TiB
   /dev/sdf        6.44TiB
   /dev/nvme1n1   64.00GiB

Metadata,DUP: Size:15.00GiB, Used:10.95GiB (72.98%)
   /dev/sda       20.00GiB
   /dev/sdb       10.00GiB

Metadata,RAID1C3: Size:16.00GiB, Used:15.31GiB (95.69%)
   /dev/sda        1.00GiB
   /dev/sdb        1.00GiB
   /dev/sde        8.00GiB
   /dev/sdd       11.00GiB
   /dev/sdc       13.00GiB
   /dev/sdf       13.00GiB
   /dev/nvme1n1    1.00GiB

System,DUP: Size:8.00MiB, Used:1.05MiB (13.09%)
   /dev/sda       16.00MiB

System,RAID1C3: Size:32.00MiB, Used:1.47MiB (4.59%)
   /dev/sde       32.00MiB
   /dev/sdc       32.00MiB
   /dev/sdf       32.00MiB

Unallocated:
   /dev/sda        1.37TiB
   /dev/sdb        1.06TiB
   /dev/sde      856.96GiB
   /dev/sdd      848.00GiB
   /dev/sdc      845.96GiB
   /dev/sdf      845.96GiB
   /dev/nvme1n1  888.87GiB

/dev/nvme1n1 was added in hope it could fix the balance, but it did not.

YourSandwich avatar Jun 28 '23 17:06 YourSandwich

I believe it could have something to do with the "Metadata,RAID1C3: Size:16.00GiB, Used:15.31GiB (95.69%)" running full.

I have read that the size should adjust itself, but apparently this is not happening here. Since I am not able to cancel the balance process, I cannot run a balance on the metadata.

YourSandwich avatar Jun 28 '23 17:06 YourSandwich

Full Kernel LOG: http://cwillu.com:8080/62.178.168.195/2

YourSandwich avatar Jun 28 '23 17:06 YourSandwich

It realy is the Metadata that is full, here the latest kernel log:

[133728.876517] BTRFS info (device sda: state A): dumping space info:
[133728.876520] BTRFS info (device sda: state A): space_info DATA has 5050423623680 free, is not full
[133728.876522] BTRFS info (device sda: state A): space_info total=30382508408832, used=25064527589376, pinned=0, reserved=0, may_use=0, readonly=267557195776 zone_unusable=0
[133728.876526] BTRFS info (device sda: state A): space_info METADATA has 3459383296 free, is full
[133728.876528] BTRFS info (device sda: state A): space_info total=32212254720, used=28193619968, pinned=21725184, reserved=32768, may_use=537165824, readonly=327680 zone_unusable=0
[133728.876532] BTRFS info (device sda: state A): space_info SYSTEM has 39206912 free, is not full
[133728.876534] BTRFS info (device sda: state A): space_info total=41943040, used=2736128, pinned=0, reserved=0, may_use=0, readonly=0 zone_unusable=0
[133728.876537] BTRFS info (device sda: state A): global_block_rsv: size 536870912 reserved 536854528
[133728.876539] BTRFS info (device sda: state A): trans_block_rsv: size 0 reserved 0
[133728.876540] BTRFS info (device sda: state A): chunk_block_rsv: size 0 reserved 0
[133728.876541] BTRFS info (device sda: state A): delayed_block_rsv: size 0 reserved 0
[133728.876542] BTRFS info (device sda: state A): delayed_refs_rsv: size 16777216 reserved 16384
[133728.876544] BTRFS: error (device sda: state A) in __btrfs_free_extent:3077: errno=-28 No space left
[133728.876550] BTRFS info (device sda: state EA): forced readonly
[133728.876552] BTRFS error (device sda: state EA): failed to run delayed ref for logical 54198081880064 num_bytes 16384 type 176 action 2 ref_mod 1: -28
[133728.876555] BTRFS: error (device sda: state EA) in btrfs_run_delayed_refs:2151: errno=-28 No space left
[133728.876560] BTRFS warning (device sda: state EA): Skipping commit of aborted transaction.
[133728.876561] BTRFS: error (device sda: state EA) in cleanup_transaction:1984: errno=-28 No space left
[133728.879119] BTRFS info (device sda: state EA): balance: ended with status: -30

Is there any way to fix this?

YourSandwich avatar Jun 28 '23 22:06 YourSandwich

I have now bought a second Disk Array to recover my data. So unfortunately I cannot test any fixes. But hopefully this issue should be simply replicated in a VM, by doing:

  1. Add 6 Disks (same size) into the VM
  2. Format 1 Disk with btrfs.
  3. Fill the disk ~70%
  4. btrfs device add second disk
  5. Fill again
  6. btrfs device add third disk
  7. Fill again
  8. convert the array -> btrfs balance start -dconvert=raid6 -mconvert=raid1c3 /mnt/Data --background

The original Disks were 6 x 8TB Seagate IronWolf 7200RPM (ST8000VN004-3CP101)

YourSandwich avatar Jul 17 '23 17:07 YourSandwich

Zygo from #btrfs:matrix.org has more inside into this Issue, and has more technical knowledge of what is happening here.

YourSandwich avatar Jul 17 '23 18:07 YourSandwich

The reproducer doesn't produce the same issue. The reproducer will produce an ENOSPC because it sets up a filesystem that does not have enough free space to complete a raid6 conversion.

In the original report, btrfs fi usage is saying there' hundreds or thousands of GiB of unallocated space on every device. There's plenty of room to expand metadata, but btrfs isn't doing that for unknown reasons.

There are some dup metadata block groups which might be confusing the allocator (maybe if it's considering them free space in the "do we need to allocate more metadata space" calculation but not in the "allocate a page in metadata space" calculation). Removing those block groups might work around the problem, but since btrfs balance cancel goes straight to ENOSPC, it's not possible to apply that solution.

Zygo avatar Jul 17 '23 18:07 Zygo

This is the history of what I have exactly done, minor details are missing because of closed sessions:

lvconvert --uncache /dev/mapper/R5-Data
lvconvert --type linear R5/Data
lvresize -y -l30523312 R5/Data
lvconvert --type striped --stripes 1 R5/Data
pvmove /dev/sda1
vgremove /dev/sda1
vgreduce R5 /dev/sda1
btrfs fi resize -5T /mnt/Data/
mkfs.btrfs -L Data /dev/sda -f
mount -o compress=zstd:15,relatime /mnt/sda /mnt/New_Data
pvmove /dev/sdb1
vgreduce R5 /dev/sdb1
rsync -avhP SomeData /mnt/New_Data/ # About 4TB
btrfs device add -f /dev/sdb /mnt/New_Data/
btrfs fi resize 18T /mnt/Data/
lvresize -L 20T R5/Data
rsync -avhP MoreData /mnt/New_Data/ # About 8TB
btrfs fi resize 10T /mnt/Data/
lvresize -L 13T /dev/R5/Data
pvmove /dev/sdc
vgremove R5 /dev/sdc
vgreduce R5 /dev/sdc
pvmove /dev/sde
vgreduce R5 /dev/sde
pvremove /dev/sde
pvmove /dev/sdd
vgreduce R5 /dev/sdd
pvremove /dev/sdd
btrfs device add -f /dev/sde /mnt/New_Data/
btrfs device add -f /dev/sdd /mnt/New_Data/
rsync -avhP MoreMoreData /mnt/New_Data/ # About 12TB
### /mnt/New_Data Data usage 24TB later some dfata was removed down to 22TB
umount -l /mnt/Data
umount -l /mnt/New_Data
vgremove R5 --force
mount -o compress=zstd:15,relatime /mnt/sda /mnt/Data
mount -a
btrfs device add -f /dev/sdc /mnt/Data
btrfs device add -f /dev/sdf /mnt/Data
btrfs balance start -dconvert=raid6 -mconvert=raid1c3 /mnt/Data --background #After ~70% done Metadata full. Read-Only mode. I saw the Metadata expanding a couple of times.
mount -o rw,skip_balance /mnt/Data

YourSandwich avatar Jul 17 '23 19:07 YourSandwich