Chuanhong Li

Results 16 comments of Chuanhong Li

> > In the Anyscale fork we saw a 50% speedup on bs=8 with a 68m-sized draft model on TP1/70B target model on TP8 and a 7B draft model on...

> Thanks for the information! Looking forward to the complete speculative decoding support! Thanks for your reply!

@yeoedward @Ying1123 @Kyriection Hi,is there an answer for the above question? Besides,I also want to know when bathcing inference is used for llama, how to update the hh_socre?

> @duyuxuan1486 hi! Have you ever encountered such an error? when bash scripts/streaming/eval.sh full > > from streaming_llm.utils import load, download_url, load_jsonl ModuleNotFoundError: No module named 'streaming_llm' https://github.com/FMInference/H2O/issues/8

> Hi, The HH scores should be sequence-independent. In this implementation, we use one sequence in each batch for testing. Will update the implementation for multi sequences shortly, by modifying...

> > Hi, The HH scores should be sequence-independent. In this implementation, we use one sequence in each batch for testing. Will update the implementation for multi sequences shortly, by...

> @hxer7963 > > What is happening is that it seems like `chunked_prefill` is disabled in your configuration. Since this model has very long `max_model_len=128k`, we need to reserve space...

> > 在 TE 初始化日志里 有获取当前可用 rdma 以及 当前选择的gid index 我之前也遇到类似的情况 是因为 GID 选择错误导致 我在这个 pr [#947](https://github.com/kvcache-ai/Mooncake/pull/947) 有修复之前选择 gid index 的错误 > > 也可以通过 MC_GID_INDEX 来设置正确的 gid > > 我用的是单节点,日志看我上面发的那个...

> > > > 在 TE 初始化日志里 有获取当前可用 rdma 以及 当前选择的gid index 我之前也遇到类似的情况 是因为 GID 选择错误导致 我在这个 pr [#947](https://github.com/kvcache-ai/Mooncake/pull/947) 有修复之前选择 gid index 的错误 > > > > 也可以通过 MC_GID_INDEX 来设置正确的...

> CC: [@alogfans](https://github.com/alogfans) can you help? Under what circumstances will this `cuMemCreate` fail? We have used the SGLang:0.5.4 images(docker pull lmsysorg/sglang:v0.5.4) to deploy pd disaggregation on the same node(H20-3e). We...