Chuanhong Li comments

Results 16 comments of


                                            Chuanhong Li

[Speculative decoding] [Help wanted] [Performance] Optimize draft-model speculative decoding

> > In the Anyscale fork we saw a 50% speedup on bs=8 with a 68m-sized draft model on TP1/70B target model on TP8 and a 7B draft model on...

[Speculative decoding] [Help wanted] [Performance] Optimize draft-model speculative decoding

> Thanks for the information! Looking forward to the complete speculative decoding support! Thanks for your reply!

HH scores summed along batch dimension

@yeoedward @Ying1123 @Kyriection Hi，is there an answer for the above question? Besides，I also want to know when bathcing inference is used for llama, how to update the hh_socre?

TASK=xsum HH_SIZE=256 RECENT_SIZE=256 Model=llama-7b and the rouge2 of h2o is low

> @duyuxuan1486 hi! Have you ever encountered such an error? when bash scripts/streaming/eval.sh full > > from streaming_llm.utils import load, download_url, load_jsonl ModuleNotFoundError: No module named 'streaming_llm' https://github.com/FMInference/H2O/issues/8

HH scores summed along batch dimension

> Hi, The HH scores should be sequence-independent. In this implementation, we use one sequence in each batch for testing. Will update the implementation for multi sequences shortly, by modifying...

HH scores summed along batch dimension

> > Hi, The HH scores should be sequence-independent. In this implementation, we use one sequence in each batch for testing. Will update the implementation for multi sequences shortly, by...

[Bug]: GPU Memory Utilization Lower Than Expected with --enable-prefix-caching

> @hxer7963 > > What is happening is that it seems like `chunked_prefill` is disabled in your configuration. Since this model has very long `max_model_len=128k`, we need to reserve space...

[Usage]: mooncake rdma error

> > 在 TE 初始化日志里有获取当前可用 rdma 以及当前选择的gid index 我之前也遇到类似的情况是因为 GID 选择错误导致我在这个 pr [#947](https://github.com/kvcache-ai/Mooncake/pull/947) 有修复之前选择 gid index 的错误 > > 也可以通过 MC_GID_INDEX 来设置正确的 gid > > 我用的是单节点，日志看我上面发的那个...

[Usage]: mooncake rdma error

> > > > 在 TE 初始化日志里有获取当前可用 rdma 以及当前选择的gid index 我之前也遇到类似的情况是因为 GID 选择错误导致我在这个 pr [#947](https://github.com/kvcache-ai/Mooncake/pull/947) 有修复之前选择 gid index 的错误 > > > > 也可以通过 MC_GID_INDEX 来设置正确的...

[Bug]: p2p nvlink usage

> CC: [@alogfans](https://github.com/alogfans) can you help? Under what circumstances will this `cuMemCreate` fail? We have used the SGLang:0.5.4 images(docker pull lmsysorg/sglang:v0.5.4) to deploy pd disaggregation on the same node(H20-3e). We...