dwq370
dwq370
### Is there an existing issue for this? - [X] I have searched the existing issues ### Current Behavior  ### Expected Behavior _No response_ ### Steps To Reproduce 1、用自己的数据微调INT4模型...
3090显卡,CUDA11.1版本,单卡运行INT4推理报错  是CUDA版本的问题吗?MOSS最低CUDA版本是哪个?
### System Info 4*NVIDIA L20 ### Who can help? _No response_ ### Information - [X] The official example scripts - [ ] My own modified scripts ### Tasks - [...
### System Info NVIDIA 2*L20 launch triton server with tensorrt-llm backend v0.12.0 in a container ### Who can help? _No response_ ### Information - [ ] The official example scripts...
### System Info Ubuntu 22.04 Triton image: nvcr.io/nvidia/tritonserver:24.06-trtllm-python-py3 and the version of trtllm-backend is 0.10.0 Model: qwen2-7b-instruct ### Who can help? _No response_ ### Information - [ ] The official...
step1: launch container ``` mkdir -p ~/nano-test docker run --gpus all --net=host --privileged -v /dev/shm:/dev/shm --name nanoflow -v ~/nano-test:/code -it nvcr.io/nvidia/nvhpc:23.11-devel-cuda_multi-ubuntu22.04 ``` step2: Install dependencies ``` git clone https://github.com/efeslab/Nanoflow.git cd...