lebronjamesking

Results 15 comments of lebronjamesking

> > hi大家好,请教下 太乙stable diffusion怎么加速推理,半精度差不多12s左右。 > > 目前在试的方法是:将vae、unet等都转onnx,然后再用StableDiffusionOnnxPipeline读。 > > 除了这个还有什么方法呢? > > 求问大佬,转onnx加速效果如何? 同问

> @hiyouga Merging LoRA weights into a quantized model is not supported. 我看可以Qlora训练量化模型,那作者大拿,Qlora模型可不可以和量化的模型合并啊。 我就用Qlora训练,然后合并。 同问,gptq量化模型如何合并呢

I have same question, how to run a dataset evaluation on a local model instance.

> export VLLM_WORKER_MULTIPROC_METHOD=spawn 请在使用多卡进行vllm评测前导入此方法 Where can I put this line of code cause I am using python run.py at the begining.

> aime2024_gen_6e39a4 你知道如果我是本地下载了数据集,怎么指向本地的路径吗

> 试试去掉colocate_actor_ref?我和你其它参数一样,除了没用vllm(环境问题),micro_rollout_batch_size还比你大。 Do you mean set vllm_num_engines=0

Same question, how do you actually determine the train batch size and rollout_batch_size

> please use train_ppo_ray.py Basically I cannot run ray with vllm. I have 1 node with 4 A100 cards.

I think the verl sglang multi-turn tool calling is working btw. https://github.com/volcengine/verl/blob/54b2677/examples/sglang_multiturn/README.md

I support you. Since now, I don't think opencompass havs ready2use multi-gpu framework yet. So for example, qwen qwq-32b model is not supported for vllm fast inference. Many internal bugs...