HomeStore icon indicating copy to clipboard operation
HomeStore copied to clipboard

recover meta block crashes sometimes in github CI and local machine build

Open yamingk opened this issue 1 year ago • 3 comments

/home/runner/.conan/data/homestore/6.0.1///build/64eb2822c4463ffd57405ab6c52e9954705502c6/src/lib/meta/meta_blk_service.cpp:1126: void homestore::MetaBlkService::recover_meta_block(homestore::meta_blk*): Assertion `0' failed. [03/25/24 19:47:05+00:00] [critical] [test_meta_blk_mgr] [7169] [meta_blk_service.cpp:1126] ******************** Assertion failure: =====> Expected '726044547' to be == to '3060308851' [type=Test_Rand_Load], CRC mismatch: 726044547/3060308851, on mblk bid: blk#=158081 count=1 chunk=0, context_sz: 152064

https://github.com/eBay/HomeStore/actions/runs/8423765023/job/23072604152

It can also be hit in local build machine

yamingk avatar Mar 29 '24 21:03 yamingk

also seen here as well: https://github.com/eBay/HomeStore/actions/runs/9020268861/job/24785052596?pr=403

yamingk avatar May 09 '24 21:05 yamingk

If we read the log carefully , there is always a write failure before the restart. I cannot tell if the write failure is exactly the metablk as we dont log the offset of the metablk, but the length do match.

In this run https://github.com/eBay/HomeStore/actions/runs/8423765023/job/23072604152 write failure

Warning:  19:46:59+00:00] [warning] [test_meta_blk_mgr] [7169] [drive_interface.cpp:435:sync_write] Error during write offset=284729344 write_size=152064 written_size=-1 errno=22 fd=35

assert

[03/25/24 19:47:05+00:00] [critical] [test_meta_blk_mgr] [7169] [meta_blk_service.cpp:1126] ******************** Assertion failure: =====> Expected '726044547' to be == to '3060308851' [type=Test_Rand_Load], CRC mismatch: 726044547/3060308851, on mblk bid: blk#=158081 count=1 chunk=0, context_sz: 152064

In this run https://github.com/eBay/HomeStore/actions/runs/9020268861/job/24785052596?pr=403 write failure

Warning:  16:42:56+00:00] [warning] [test_meta_blk_mgr] [6627] [drive_interface.cpp:435:sync_write] Error during write offset=428580864 write_size=33792 written_size=-1 errno=22 fd=37

assert

[05/09/24 16:43:01+00:00] [critical] [test_meta_blk_mgr] [6627] [meta_blk_service.cpp:1126] ******************** Assertion failure: =====> Expected '2845276984' to be == to '2914764139' [type=Test_Rand_Load], CRC mismatch: 2845276984/2914764139, on mblk bid: blk#=158125 count=1 chunk=0, context_sz: 33792

xiaoxichen avatar May 15 '24 10:05 xiaoxichen

as the error code is 22 (einval)

EINVAL
fd is attached to an object which is unsuitable for writing; or the file was opened with the O_DIRECT flag, and either the address specified in buf, the value specified in count, or the current file offset is not suitably aligned.

I believe it is our issue not env issue. I checked the offset is align to 4K, size align to 512, but not sure if the buffer is aligned, also the offset is well below 1GB so unlikely we are hitting any size boundary . Regarding logs, suggesting more logs in iomgr regarding the write failure, especially dump the buffer address as well as the FD open flag.

But as we get CRC mismatch, that means we probably get a partial write. This is much easier to happen compare to bit rot.

xiaoxichen avatar May 15 '24 10:05 xiaoxichen