CacheLib icon indicating copy to clipboard operation
CacheLib copied to clipboard

Cachelib with iouring

Open sriramsrao opened this issue 8 months ago • 2 comments

I am trying to get Cachelib to use io_uring for doing the I/O. I checked that the Linux kernel we have has this support enabled. The errors I get are:

E0516 05:41:49.355798   132 NavyRequestScheduler.cpp:203] [navy_writer_7] IO stalled. submitted 365 -> 365 completed 337 -> 337
E0516 05:41:49.495906   133 Device.cpp:557] [ctx_0] IO error: [req 0x7ef100020f00] idx 0 fd 10 op write offset 1073741824 size 1048576 data 0x7ef0a5fff000 resubmitted 0 len=-95 errno=4 (Interrupted system call)
E0516 05:41:49.852437   133 Device.cpp:557] [ctx_0] IO error: [req 0x7ef10001a870] idx 0 fd 10 op write offset 1073741824 size 1048576 data 0x7ef0a5fff000 resubmitted 0 len=-95 errno=4 (Interrupted system call)
E0516 05:41:49.970734   133 Device.cpp:557] [ctx_0] IO error: [req 0x7ef10001a870] idx 0 fd 10 op write offset 1073741824 size 1048576 data 0x7ef0a5fff000 resubmitted 0 len=-95 errno=4 (Interrupted system call)
...
I0516 05:44:30.887253   169 cache_manager.cc:174] Get segment failed for: 13075723188911306909
I0516 05:44:30.904218   169 cache_manager.cc:174] Get segment failed for: 8278234487427450299
I0516 05:44:30.920457   169 cache_manager.cc:174] Get segment failed for: 11674330521192113941
I0516 05:44:30.937139   169 cache_manager.cc:174] Get segment failed for: 13766734649169596872

Any I/O on the NVMe seems to be failing:

NVM MB read: 0
Num NVM evictions: 2565
Num Cache Evictions: 2565
NVM write errors: 950
Num NVM get miss: 0
NVM read errors: 0
Num NVM gets: 0
NVM MB written: 31
Num Cache get miss: 0
Num NVM puts: 2566
Num Cache gets: 0
Num Cache hits: 0
Num NVM put errors: 0
Num items: 4034

sriramsrao avatar May 16 '25 20:05 sriramsrao

Similar issues here. I am currently using custom SSD cache. While the design correctly works in synchronous mode (i.e., qDepth: 1), if I try to use asynchronous mode (no FDP), the similar errors occur.

I also tested BigHash, but the same issue occurs and terminated after logging:

C0518 21:31:00.367145 1598172 NvmCache-inl.h:795] Disabling navy. Delete Failure. status = 4

Async related configurations are:

navyReaderThreads=16
navyWriteThreads=16
navyMaxNumReads=128
navyMaxNumWrites=128
navyStackSizeKB=128
fd 6 op write offset 5810421760 size 262144 data 0x24106000 resubmitted 0 len=-22 errno=11 (Resource temporarily unavailable)
E0518 21:20:24.867932 1596163 Device.cpp:569] [ctx_3] IO error: [req 0x7de66803a760] idx 0 fd 6 op write offset 5821693952 size 262144 data 0x2420a000 resubmitted 0 len=-95 errno=11 (Resource temporarily unavailable)
E0518 21:20:24.867938 1596163 Device.cpp:569] [ctx_3] IO error: [req 0x7de66802a010] idx 0 fd 6 op write offset 5821169664 size 262144 data 0x237e2000 resubmitted 0 len=-22 errno=11 (Resource temporarily unavailable)
E0518 21:20:24.867943 1596163 Device.cpp:569] [ctx_3] IO error: [req 0x7de66802fbd0] idx 0 fd 6 op write offset 5788139520 size 262144 data 0x236de000 resubmitted 0 len=-95 errno=11 (Resource temporarily unavailable)
E0518 21:20:24.867949 1596163 Device.cpp:569] [ctx_3] IO error: [req 0x7de668028ce0] idx 0 fd 6 op write offset 5786042368 size 262144 data 0x234d6000 resubmitted 0 len=-95 errno=11 (Resource temporarily unavailable)
E0518 21:20:24.867954 1596163 Device.cpp:569] [ctx_3] IO error: [req 0x7de66803ab70] idx 0 fd 6 op write offset 5821956096 size 262144 data 0x235da000 resubmitted 0 len=-22 errno=11 (Resource temporarily unavailable)
E0518 21:20:25.868969 1596158 Device.cpp:569] [ctx_4] IO error: [req 0x7de59c0166d0] idx 0 fd 6 op read offset 5824172032 size 4096 data 0x7de688024000 resubmitted 0 len=-22 errno=11 (Resource temporarily unavailable)
E0518 21:20:25.869688 1596163 Device.cpp:569] [ctx_3] IO error: [req 0x7de594026f30] idx 0 fd 6 op write offset 5815926784 size 262144 data 0x24002000 resubmitted 0 len=-22 errno=11 (Resource temporarily unavailable)
E0518 21:20:25.869723 1596163 Device.cpp:569] [ctx_3] IO error: [req 0x7de5780257c0] idx 0 fd 6 op write offset 5821693952 size 262144 data 0x2420a000 resubmitted 0 len=-22 errno=11 (Resource temporarily unavailable)
E0518 21:20:25.869729 1596163 Device.cpp:569] [ctx_3] IO error: [req 0x7de59c019b60] idx 0 fd 6 op write offset 5815664640 size 262144 data 0x23aee000 resubmitted 0 len=-22 errno=11 (Resource temporarily unavailable)
E0518 21:20:25.869735 1596163 Device.cpp:569] [ctx_3] IO error: [req 0x7de5bc025920] idx 0 fd 6 op write offset 5821169664 size 262144 data 0x237e2000 resubmitted 0 len=-22 errno=11 (Resource temporarily unavailable)
E0518 21:20:25.869742 1596163 Device.cpp:569] [ctx_3] IO error: [req 0x7de668037360] idx 0 fd 6 op write offset 5823266816 size 262144 data 0x2430e000 resubmitted 0 len=-22 errno=11 (Resource temporarily unavailable)
E0518 21:20:25.869747 1596163 Device.cpp:569] [ctx_3] IO error: [req 0x7de5cc01a0e0] idx 0 fd 6 op write offset 5821956096 size 262144 data 0x235da000 resubmitted 0 len=-22 errno=11 (Resource temporarily unavailable)
E0518 21:20:25.870413 1596162 Device.cpp:569] [ctx_0] IO error: [req 0x7de5cc02b580] idx 0 fd 6 op write offset 5813043200 size 262144 data 0x2369d000 resubmitted 0 len=-22 errno=11 (Resource temporarily unavailable)
E0518 21:20:25.870473 1596162 Device.cpp:569] [ctx_0] IO error: [req 0x7de568064610] idx 0 fd 6 op write offset 5805178880 size 262144 data 0x23cb5000 resubmitted 0 len=-22 errno=11 (Resource temporarily unavailable)
E0518 21:20:25.870480 1596162 Device.cpp:569] [ctx_0] IO error: [req 0x7de57802f660] idx 0 fd 6 op write offset 5824839680 size 262144 data 0x23599000 resubmitted 0 len=-22 errno=11 (Resource temporarily unavailable)

rainjuns avatar May 18 '25 12:05 rainjuns

I have seen these issue too, try to match the deviceMaxWriteSize setting with the nows setting of your SSD through this command (requires nvme-cli version 2.x):

sudo nvme id-ns /dev/nvmeXn1 -H

For example, if your nows=47, then try to set deviceMaxWriteSize= (47+1)*4096 = 196,608

let me know that fix your issue.

PapperYZ avatar Aug 29 '25 16:08 PapperYZ