Zhanxiang (Patrick) Huang

Results 129 comments of Zhanxiang (Patrick) Huang

The idea looks promising! How about: 1. Run micro-benchmark to compare the overhead for S3-FIFO and LRU. 2. Run perf test to compare the cache efficiency for S3-FIFO and LRU.

> I prefer to do some data analysis in the flush phase and perform a split on the SST to promote boundary alignment. By split you mean putting data related...

Note that re-run doesn't help and the recovery simulation test is always stuck when running with `MADSIM_TEST_SEED=5`.

> Previously fixed issue: #14104. > > The fix should be already in 1.7. But this bug could be due to similar reasons. The log indeed stops, indicating that the...

We can use this branch for repro: https://github.com/risingwavelabs/risingwave/commits/patrick/recovery-stuck-repro/

> > > Root cause is deadlock between barrier recovery and drop stream job. Great finding! Just curious: how do you debug the issue? Did you use any deadlock detection...

memtable spill is enabled: https://grafana.test.risingwave-cloud.xyz/d/liz0yRCZz1/log-search-dashboard?from=1708542494000&orgId=1&to=1708547688000&var-data_source=PE59595AED52CF917&var-namespace=longcmkf-20240221-190330&var-pod=benchmark-risingwave-compute-c-0&var-search=mem_table_spill I think it is similar to https://github.com/risingwavelabs/risingwave/issues/15057#issuecomment-1961343919. It is already fixed in #15232.

What is value of object_store_read_timeout? By default it is 8mins. Like #15209, It seems that the timeout didn't trigger as well.

Potential root cause found: https://github.com/risingwavelabs/risingwave/issues/15209#issuecomment-1963618701