zhangyu68 comments

Results 13 comments of


                                            zhangyu68

Support Deepseek-V2

> > Hi @zwd003 May you merge the latest main branch and fix the conflicts? Thanks. > > ok hello,I encountered this error when the QPS was increased to 2....

meta_instruction= "你一个名叫ChatLAW，由北京大学团队开发的人工智能助理：\n- 你旨在提供有无害且准确的回答。\n- 你必须拒绝回答非法的问题。\n- 你的回应不能含糊、指责、粗鲁、有争议、离题或防御性。\n- 你的回应必须有礼貌。" query = "公司无故辞退摸鱼员工，是否触犯法律？" # prompt = f"Consult:\n{consult}\nResponse:\n" prompt = f"{meta_instruction}\nConsult:\n{query}\nResponse:\n" inputs = tokenizer(prompt, return_tensors="pt") inputs['input_ids'] = inputs['input_ids'].to(model.device) with torch.no_grad(): generation_output = model.generate(...

Qwen1.5-MoE-A2.7B-Chat微调GPU利用率很低

遇到了类似问题，lora sft 相同配置下，qwen-14b-chat的GPU利用率能达到90+ moe模型的GPU利用率只有40左右使用的是llama-factory 训练框架，环境信息如下： ``` nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Tue_Aug_15_22:02:13_PDT_2023 Cuda compilation tools, release 12.2, V12.2.140 Build cuda_12.2.r12.2/compiler.33191640_0 Package...

Export ONNX 4 bit

searching for solutions

Add LLava ONNX export

how can I export a onnx model by llava-1.5-7b-hf? when I run this command and then get an error: `optimum-cli export onnx --model /workspace/[email protected]/original_models/llava-1.5-7b-hf onnx_model/llava-v1.5-7b --task image-to-text-with-past --trust-remote-code ` `Traceback...

Import Error

> > I am also getting `ImportError: /lib/python3.8/site-packages/flash_attn_2_cuda.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi` > > I got the same error with PyTorch 2.3.0 when I installed it with the [wheel file](https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.7/flash_attn-2.5.7+cu122torch2.3cxx11abiFALSE-cp311-cp311-linux_x86_64.whl). I...

Add LLava ONNX export has a problem

我用这个分支导出成功了，环境是： onnx 1.16.1 onnxruntime-gpu 1.18.1 opencv-python 4.10.0.84 openpyxl 3.1.3 optimum 1.20.0.dev0 cuda 12.1 ![image](https://github.com/user-attachments/assets/bb2555cc-404a-4389-9767-8c8ecab9df88)

[TensorRT-LLM][ERROR] Assertion failed: mNextBlocks.empty() (/home/jenkins/agent/workspace/LLM/main/L0_PostMerge/llm/cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp:160)

the same issue on tensorrt-llm 0.7.1 [TensorRT-LLM][ERROR] Encountered error for requestId 487602187: Encountered an error in forward function: [TensorRT-LLM][ERROR] Assertion failed: mNextBlocks.empty() (/home/jenkins/agent/workspace/LLM/release-0.7/L0_PostMerge/llm/cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp:160) 1 0x7fd6cf49d68d /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x1668d) [0x7fd6cf49d68d] 2 0x7fd6cf4a4ebf /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x1debf)...

[TensorRT-LLM][ERROR] Assertion failed: mNextBlocks.empty() (/home/jenkins/agent/workspace/LLM/main/L0_PostMerge/llm/cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp:160)

> the same issue on tensorrt-llm 0.7.1 [TensorRT-LLM][ERROR] Encountered error for requestId 487602187: Encountered an error in forward function: [TensorRT-LLM][ERROR] Assertion failed: mNextBlocks.empty() (/home/jenkins/agent/workspace/LLM/release-0.7/L0_PostMerge/llm/cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp:160) 1 0x7fd6cf49d68d /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x1668d) [0x7fd6cf49d68d] 2 0x7fd6cf4a4ebf...

AttributeError: 'PluginConfig' object has no attribute '_streamingllm'. Did you mean: '_streamingllm'?

> I fixed this issue by editing plugin/plugin.py and changing all the dataclass fields in PluginConfig to have init=True It works,tks.