Justin comments

Results 22 comments of


                                            Justin

Add data menu fails in wayland linux

I got the same issue Ubuntu22.04 in wayland,But When I change to XOrg it will be OK

Local models?

I am a non-subscriber of openai, can I still use UFO? I follow your instructions and my config file as following，I set the model to gpt3.5 version: 0.1 API_TYPE: "openai"...

M1 runtime fails with "AssertionError: Torch not compiled with CUDA enabled"

> As you see, the problem stands in line `self.scale_base = torch.nn.Parameter(torch.FloatTensor(scale_base).cuda()).requires_grad_(sb_trainable)` which in a previous (bad) tentative of allowing people to use CUDA, forced the parameter to be on...

M1 runtime fails with "AssertionError: Torch not compiled with CUDA enabled"

> That's strange. Could you please create a reproducible gist/snippet where I can try to reproduce your case in order to further expand the PR if needed? That would very...

M1 runtime fails with "AssertionError: Torch not compiled with CUDA enabled"

> I just think, as the RuntimeError describes, you do not have to cast to float through `.float()`, or maybe cast it as double Thanks, ``` python dataset['train_input'] = torch.from_numpy(train_input).float()...

torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: 'NoneType' object is not iterable

@nv-guomingz Hi，I got the same questions with this command when using RTX4090 ```latex python3 /tensorrtllm_backend/tensorrt_llm/examples/qwen/convert_checkpoint.py --model_dir /root/7b \ --output_dir /root/converted/7b/f16/1gpu-int4_gptq \ --dtype auto \ --use_weight_only \ --tp_size 1 \ --weight_only_precision...

torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: 'NoneType' object is not iterable

@DancingKitty Maybe you can try tthe command like this: ```python python3 /tensorrtllm_backend/tensorrt_llm/examples/qwen/convert_checkpoint.py --model_dir /root/7b \ --output_dir /root/converted/7b/f16/1gpu \ --dtype float16 \ --use_weight_only \ --tp_size 1 \ --weight_only_precision int4 ```

Streaming Inference Failure

@mathijshenquet Hi,I followed your instructions,change the "decoupled" from two folders "tensorrt_llm tensorrt_llm_bls" 's config.pbtxt,but it still doesn't work,and ![mode2](https://github.com/user-attachments/assets/e22d1a6b-65ff-4303-9278-80f839b0188f) ![mode](https://github.com/user-attachments/assets/4b670bdd-fd3d-4449-948a-d3c1ba82f134) ![mode3](https://github.com/user-attachments/assets/4ffeed59-2a17-4825-9647-1c24c286c092)

Streaming Inference Failure

@mathijshenquet Thanks，I ran it well after I modify the config.pbtxt with the ensemble&tensorrt_llm_bls,But tensorrt_llm doesn't work still ```{ model_transaction_policy { decoupled: True } with the error ```{root@gpu8:/triton_model_repo# curl -X POST...