lebronjamesking comments

Results 15 comments of


                                            lebronjamesking

加速推理问题

> > hi大家好，请教下太乙stable diffusion怎么加速推理，半精度差不多12s左右。 > > 目前在试的方法是：将vae、unet等都转onnx，然后再用StableDiffusionOnnxPipeline读。 > > 除了这个还有什么方法呢？ > > 求问大佬，转onnx加速效果如何？同问

Merging LoRA weights into a quantized model is not supported 嗯。你说的

> @hiyouga Merging LoRA weights into a quantized model is not supported. 我看可以Qlora训练量化模型，那作者大拿，Qlora模型可不可以和量化的模型合并啊。我就用Qlora训练，然后合并。同问，gptq量化模型如何合并呢

[Feature] 我需要对本地部署的Qwen-110B模型进行MMLU基准测试，请问该怎么操作呢？

I have same question, how to run a dataset evaluation on a local model instance.

[Bug] 使用vllm加速时无法启动多卡

> export VLLM_WORKER_MULTIPROC_METHOD=spawn 请在使用多卡进行vllm评测前导入此方法 Where can I put this line of code cause I am using python run.py at the begining.

[Bug] DeepSeek R1 32B 模型测评 AIME2024 数据集得分低

> aime2024_gen_6e39a4 你知道如果我是本地下载了数据集，怎么指向本地的路径吗

7b model PPO training on 4*40G A100, OOM

> 试试去掉colocate_actor_ref？我和你其它参数一样，除了没用vllm(环境问题)，micro_rollout_batch_size还比你大。 Do you mean set vllm_num_engines=0

Is there any logic behind choosing the rollout_batch_size in grpo/ppo?

Same question, how do you actually determine the train batch size and rollout_batch_size

vllm_enable_sleep undefined in train_ppo args

> please use train_ppo_ray.py Basically I cannot run ray with vllm. I have 1 node with 4 A100 cards.

when running the official gsm8k with tool, multi turn async rollout sglang example without any modifications, the model crashes and appears Nan.

I think the verl sglang multi-turn tool calling is working btw. https://github.com/volcengine/verl/blob/54b2677/examples/sglang_multiturn/README.md

[Feature] 希望增加多机多卡测试的功能

I support you. Since now, I don't think opencompass havs ready2use multi-gpu framework yet. So for example, qwen qwq-32b model is not supported for vllm fast inference. Many internal bugs...