Fix sparse image file cause file system corrputed
fix linux virtual machine (ubuntu 23.10.1) report error
" EXT4-fs error (device vda2): ext4_validate_block_bitmap:421: comm kworker/u20:0 bg 2230: bad block bitmap checksum EXT4-fs (vda2): Delayed block allocation failed for inode 14327754 at logical offset 0 with max blocks 47 with error 74 EXT4-fs (vda2): THis should not happen!! Data will be lost " the reason is virtual machine disk image is created as sparse file
truncate command make a sparse file, the space will not alloc before really used this should be better at most time. but maybe not suitable for virtual machines, especially in the case of heavy IO loads this may give extra time delay and operational interruptions when system do really space alloc this behavior may cause later write completed before previous write the incorrect write order may cause file system corrputed
I'm afraid using this does not help with extreme io cases. By running stress-ng --iomix 2 on a btrfs fs would still result in fs error after a couple of minutes.
we do test on EXT4 , it works better. test use EXT4 fs ?
May I ask which hypervisor you are using? If it's Apple Virtualization, simply using a non-sparse file might not be sufficient to prevent filesystem errors. We've had an extensive discussion in #4840. I notice you are using a raw image, so I assume you are using Apple Virtualization because the default image format for QEMU is qcow2.
we are using Apple Virtualization , we try to build Android AOSP on mac M2
we have try to set VZDiskImageCachingMode to uncached , found it works more better , please try this modify
func vzDiskImage() throws -> VZDiskImageStorageDeviceAttachment? {
if let imageURL = imageURL {
if #available(macOS 12, *) {
/*
* virtual disk cache mode have bugs,
* when it is enabled or set to auto (default value)
* may cause linux file system corrputed, especially in the case of heavy IO loads
*/
return try VZDiskImageStorageDeviceAttachment(url: imageURL, readOnly: isReadOnly, cachingMode:VZDiskImageCachingMode.uncached, synchronizationMode: VZDiskImageSynchronizationMode.full)
} else {
return try VZDiskImageStorageDeviceAttachment(url: imageURL, readOnly: isReadOnly)
}
} else {
return nil
}
}
We already discussed and tried this approach before: https://github.com/utmapp/UTM/issues/4840#issuecomment-1823530081
We don't get filesystem error initially, but after a reboot we still have a filesystem error showing up. The most reliable way I've found is to switch to VZNVMExpressControllerDeviceConfiguration instead of the VZVirtioBlockDeviceConfiguration for Linux VMs, as demonstrated in pr #5919. But the nvme device config is only available on macOS 14+ host. We also tried to patch the kernel to throttle cache flushing frequency and it makes the heavy-io workloads much more stable, but it does not fix the issue 100%.
maybe we should defalut disable cache below macOS 14 and switch to VZNVMExpressControllerDeviceConfiguration on macOS 14+
for now we test aosp build works better than cache enabled
we found when cache disabled , after a reboot we do'nt found filesystem error showing up again on macOS 14.1.1 host. maybe apple have fixed this problem on macOS 14.1.1
For me, the following combinations work reliably:
- NVMe with any caching mode
- virtio with
cachedcaching mode
Details: #4840
Anyway spare files are interesting and would be cool to have.