[BUG?] free_extent_update_loop: errno=-22 unknown
Hello, I was converting my HDD ARRAY from Linear to RAID6(Data) RAID1C3(Metadata) about 40% into the balance process and my FS got into forced Read-only mode, and the balance don't want to finish and i am unable to cancel the balance, need to mount the FS with skip_balance.
dmesg, gives me following errors:
[114905.372197] BTRFS: error (device sda: state A) in find_free_extent_update_loop:4129: errno=-22 unknown
[114905.372205] BTRFS info (device sda: state EA): forced readonly
[114905.372211] BTRFS: error (device sda: state EA) in reset_balance_state:3595: errno=-22 unknown
[114905.372221] BTRFS info (device sda: state EA): balance: canceled
My Array contains 6 x 8TB Disks, and btrfs check is unable to find any issues ´, this is the output of btrfs filesystem usage:
Overall:
Device size: 44.60TiB
Device allocated: 37.99TiB
Device unallocated: 6.61TiB
Device missing: 0.00B
Device slack: 0.00B
Used: 34.06TiB
Free (estimated): 6.95TiB (min: 4.79TiB)
Free (statfs, df): 6.11TiB
Data ratio: 1.51
Metadata ratio: 2.52
Global reserve: 512.00MiB (used: 0.00B)
Multiple profiles: yes (data, metadata, system)
Data,single: Size:10.23TiB, Used:9.92TiB (96.93%)
/dev/sda 4.10TiB
/dev/sdb 6.14TiB
Data,RAID6: Size:14.80TiB, Used:12.88TiB (86.99%)
/dev/sda 1.79TiB
/dev/sdb 77.04GiB
/dev/sde 6.43TiB
/dev/sdd 6.44TiB
/dev/sdc 6.44TiB
/dev/sdf 6.44TiB
/dev/nvme1n1 64.00GiB
Metadata,DUP: Size:15.00GiB, Used:10.95GiB (72.98%)
/dev/sda 20.00GiB
/dev/sdb 10.00GiB
Metadata,RAID1C3: Size:16.00GiB, Used:15.31GiB (95.69%)
/dev/sda 1.00GiB
/dev/sdb 1.00GiB
/dev/sde 8.00GiB
/dev/sdd 11.00GiB
/dev/sdc 13.00GiB
/dev/sdf 13.00GiB
/dev/nvme1n1 1.00GiB
System,DUP: Size:8.00MiB, Used:1.05MiB (13.09%)
/dev/sda 16.00MiB
System,RAID1C3: Size:32.00MiB, Used:1.47MiB (4.59%)
/dev/sde 32.00MiB
/dev/sdc 32.00MiB
/dev/sdf 32.00MiB
Unallocated:
/dev/sda 1.37TiB
/dev/sdb 1.06TiB
/dev/sde 856.96GiB
/dev/sdd 848.00GiB
/dev/sdc 845.96GiB
/dev/sdf 845.96GiB
/dev/nvme1n1 888.87GiB
/dev/nvme1n1 was added in hope it could fix the balance, but it did not.
I believe it could have something to do with the "Metadata,RAID1C3: Size:16.00GiB, Used:15.31GiB (95.69%)" running full.
I have read that the size should adjust itself, but apparently this is not happening here. Since I am not able to cancel the balance process, I cannot run a balance on the metadata.
Full Kernel LOG: http://cwillu.com:8080/62.178.168.195/2
It realy is the Metadata that is full, here the latest kernel log:
[133728.876517] BTRFS info (device sda: state A): dumping space info:
[133728.876520] BTRFS info (device sda: state A): space_info DATA has 5050423623680 free, is not full
[133728.876522] BTRFS info (device sda: state A): space_info total=30382508408832, used=25064527589376, pinned=0, reserved=0, may_use=0, readonly=267557195776 zone_unusable=0
[133728.876526] BTRFS info (device sda: state A): space_info METADATA has 3459383296 free, is full
[133728.876528] BTRFS info (device sda: state A): space_info total=32212254720, used=28193619968, pinned=21725184, reserved=32768, may_use=537165824, readonly=327680 zone_unusable=0
[133728.876532] BTRFS info (device sda: state A): space_info SYSTEM has 39206912 free, is not full
[133728.876534] BTRFS info (device sda: state A): space_info total=41943040, used=2736128, pinned=0, reserved=0, may_use=0, readonly=0 zone_unusable=0
[133728.876537] BTRFS info (device sda: state A): global_block_rsv: size 536870912 reserved 536854528
[133728.876539] BTRFS info (device sda: state A): trans_block_rsv: size 0 reserved 0
[133728.876540] BTRFS info (device sda: state A): chunk_block_rsv: size 0 reserved 0
[133728.876541] BTRFS info (device sda: state A): delayed_block_rsv: size 0 reserved 0
[133728.876542] BTRFS info (device sda: state A): delayed_refs_rsv: size 16777216 reserved 16384
[133728.876544] BTRFS: error (device sda: state A) in __btrfs_free_extent:3077: errno=-28 No space left
[133728.876550] BTRFS info (device sda: state EA): forced readonly
[133728.876552] BTRFS error (device sda: state EA): failed to run delayed ref for logical 54198081880064 num_bytes 16384 type 176 action 2 ref_mod 1: -28
[133728.876555] BTRFS: error (device sda: state EA) in btrfs_run_delayed_refs:2151: errno=-28 No space left
[133728.876560] BTRFS warning (device sda: state EA): Skipping commit of aborted transaction.
[133728.876561] BTRFS: error (device sda: state EA) in cleanup_transaction:1984: errno=-28 No space left
[133728.879119] BTRFS info (device sda: state EA): balance: ended with status: -30
Is there any way to fix this?
I have now bought a second Disk Array to recover my data. So unfortunately I cannot test any fixes. But hopefully this issue should be simply replicated in a VM, by doing:
- Add 6 Disks (same size) into the VM
- Format 1 Disk with btrfs.
- Fill the disk ~70%
-
btrfs device add second disk - Fill again
-
btrfs device add third disk - Fill again
- convert the array ->
btrfs balance start -dconvert=raid6 -mconvert=raid1c3 /mnt/Data --background
The original Disks were 6 x 8TB Seagate IronWolf 7200RPM (ST8000VN004-3CP101)
Zygo from #btrfs:matrix.org has more inside into this Issue, and has more technical knowledge of what is happening here.
The reproducer doesn't produce the same issue. The reproducer will produce an ENOSPC because it sets up a filesystem that does not have enough free space to complete a raid6 conversion.
In the original report, btrfs fi usage is saying there' hundreds or thousands of GiB of unallocated space on every device. There's plenty of room to expand metadata, but btrfs isn't doing that for unknown reasons.
There are some dup metadata block groups which might be confusing the allocator (maybe if it's considering them free space in the "do we need to allocate more metadata space" calculation but not in the "allocate a page in metadata space" calculation). Removing those block groups might work around the problem, but since btrfs balance cancel goes straight to ENOSPC, it's not possible to apply that solution.
This is the history of what I have exactly done, minor details are missing because of closed sessions:
lvconvert --uncache /dev/mapper/R5-Data
lvconvert --type linear R5/Data
lvresize -y -l30523312 R5/Data
lvconvert --type striped --stripes 1 R5/Data
pvmove /dev/sda1
vgremove /dev/sda1
vgreduce R5 /dev/sda1
btrfs fi resize -5T /mnt/Data/
mkfs.btrfs -L Data /dev/sda -f
mount -o compress=zstd:15,relatime /mnt/sda /mnt/New_Data
pvmove /dev/sdb1
vgreduce R5 /dev/sdb1
rsync -avhP SomeData /mnt/New_Data/ # About 4TB
btrfs device add -f /dev/sdb /mnt/New_Data/
btrfs fi resize 18T /mnt/Data/
lvresize -L 20T R5/Data
rsync -avhP MoreData /mnt/New_Data/ # About 8TB
btrfs fi resize 10T /mnt/Data/
lvresize -L 13T /dev/R5/Data
pvmove /dev/sdc
vgremove R5 /dev/sdc
vgreduce R5 /dev/sdc
pvmove /dev/sde
vgreduce R5 /dev/sde
pvremove /dev/sde
pvmove /dev/sdd
vgreduce R5 /dev/sdd
pvremove /dev/sdd
btrfs device add -f /dev/sde /mnt/New_Data/
btrfs device add -f /dev/sdd /mnt/New_Data/
rsync -avhP MoreMoreData /mnt/New_Data/ # About 12TB
### /mnt/New_Data Data usage 24TB later some dfata was removed down to 22TB
umount -l /mnt/Data
umount -l /mnt/New_Data
vgremove R5 --force
mount -o compress=zstd:15,relatime /mnt/sda /mnt/Data
mount -a
btrfs device add -f /dev/sdc /mnt/Data
btrfs device add -f /dev/sdf /mnt/Data
btrfs balance start -dconvert=raid6 -mconvert=raid1c3 /mnt/Data --background #After ~70% done Metadata full. Read-Only mode. I saw the Metadata expanding a couple of times.
mount -o rw,skip_balance /mnt/Data