dshwei comments

Repositories
Issues
Comments

Results 3 comments of


                                            dshwei

[Usage]:Input prompt (2501 tokens) is too long and exceeds limit of 2048

reduce gpu-memory-utilization value may also reslove this problem

Update run_clm_sft_with_peft.py

When you want to use peft checkpoints as a continuing fine-tuning ,,PeftModel.from_pretrained's parameter is_trainable=True is required

set chunked_prefill off when use mla

vllm 0.7.1 torch 2.5.1 when use this version vllm , and set VLLM_TORCH_PROFILER_DIR=./traces/ command as following : VLLM_TORCH_PROFILER_DIR=./traces/ vllm serve /workspace/models/DeepSeek-V2-Lite-Chat \ --gpu-memory-utilization 0.80 \ --max-model-len 8000 \ --max-num-batched-tokens 32000...