halexan comments

Results 23 comments of


                                            halexan

用搜狗输入法打字时按esc键会直接最小化，我明明只是字打错了想重新打而已

之前我也反应过这个问题，直接被作者无情的关了issue，貌似最近作者不再更新了

FEAT: support deepseek v2 chat and coder

希望能增加deepseek v2 和 deepseek coder v2 的支持 vllm 0.5.1版本已支持deepseek v2, 详见[vllm 0.5.1 release](https://github.com/vllm-project/vllm/releases/tag/v0.5.1)

FEAT: support deepseek v2 chat and coder

Looking forward！

Support MLA (Multi-Head Latent Attention) in DeepSeek-v2

Any updates for deepseek v2?

如何能达到论文里说的吞吐量50000多tokens

> In order to efficiently deploy DeepSeek-V2 for service, we first convert its parameters into the precision of FP8. In addition, we also perform KV cache quantization (Hooper et al.,...

如何能达到论文里说的吞吐量50000多tokens

> > > In order to efficiently deploy DeepSeek-V2 for service, we first convert its parameters into the precision of FP8. In addition, we also perform KV cache quantization (Hooper...

Error Report, The program was running fine before, but recently I’ve encountered this error. Any Idea?

Tested DeepSeek-V2-Chat-0628 on 8*A800 serve ```python python3 -m sglang.launch_server \ --model-path /data/model-cache/deepseek-ai/DeepSeek-V2-Chat-0628 \ --served-model-name deepseek-chat \ --tp 8 \ --enable-mla \ --disable-radix-cache \ --mem-fraction-static 0.87 \ --schedule-conservativeness 0.1 \ --chunked-prefill-size...

Error Report, The program was running fine before, but recently I’ve encountered this error. Any Idea?

@Xu-Chen Does your 8*A800 has nvlink?

[Feature] DeepSeek-Coder-V2-Instruct-FP8 on 8xA100

> VLLM don't support MoE FP8 models on Ampere. This is because vLLM uses Triton for its FusedMoE kernel, which doesn't support the FP8 Marlin mixed-precision gemm. See https://huggingface.co/neuralmagic/DeepSeek-Coder-V2-Instruct-FP8/discussions/1 >...

[Feature] DeepSeek-Coder-V2-Instruct-FP8 on 8xA100

> @Xu-Chen So can we use sglang to run deepseek v2 232B? Thanks Yes, you can, without quantization.