Questions about trace files when running cachebench
Hello, thank you for managing the great project!
I found that cachelib provides several traces in here and I have two questions in testing them using cachebench.
- Do trace files include
setoperations caused by misses ongetoperations?
- I found that CacheLib has options (
enableLookAside) for performingsetoperations over misses ongetoperations. - However, I wonder if such behaviors are already captured in trace files.
- If there are a lot of
getoperations beforesetoperations, which can be captured in trace files, is a miss ratio in CacheLib still accurate?
- Depending on how trace files are collected, only
getoperations can be captured and they will cause a lot of misses, increasing a miss ratio. - In addition, if
enableLookAsideis turned on, manysetoperations will be generated for the same key-value. - In production level,
getoperations for the same key might be queued while waiting the response from the first miss trigger. - Please refer to the following trace lines (the first file of
kvcache/202206), key:1665497896will generate a lot of misses:
key,op,size,op_count,key_size
1668757755,SET,82,1,40
1668757755,GET,0,1,40
1668757805,SET,208,1,63
1668757805,GET,0,1,63
1665498006,GET,104,2,64
1666258101,GET,81,2,23
1665497896,GET,169,18,78
1665702915,SET,109,1,40
1665702915,GET,0,1,40
1665497896,GET,169,18,78
- For requests with the same key, what is difference between (1) a trace line with
op_countlarger than 1 and (2) multiple trace lines withop_count=1?
1). Yes the traces include "SET" which are triggered due to misses to get operations in our systems. There're some exceptions in KV traces. Notably some clients do "SET" first and then "GET" (after some minutes or hours). These clients are basically prefetching data. They're rare in the traces compared to the regular cache set-after-a-miss workloads.
2). enableLookAside should only be used when you filter out all the "SET" operations from the original trace. This is useful when you have a cache size drastically different from the cache config, as it will enable CacheBench to behave like an actual cache instead of just replaying the original set traces. (E.g. original hit rate at 90% would have much fewer sets compared to a smaller cache at 50% but receiving the same GET workload).
3). op_count = the number of requests we have seen for this key in this "second" when we collected the traces originally. Each row in our trace represents a second worth of requests per key per operation.
@therealgymmy Thank you for the response. I appreciate it!