Johannes Vass

Results 29 comments of Johannes Vass

The pull request #266 fixes the problem (in the sense that the server can successfully load the new model again).

Unfortunately that's all I currently have. If the problem occurs again I can perhaps gather more logs and tell more precisely what my last actions were.

This PR solves my issue descibed [here](https://github.com/huggingface/text-generation-inference/issues/2838#issuecomment-2586691348), thank you!

I just noticed that you changed the implementation between me building the image and my previous message. Should I test again?

Do you have a way of reproducing the error more quickly than building the entire image from scratch every time? And do you manage to get to the exact compiler...

I get the same errors with the latest docker image. So far I tested Mixtral 8x7B and llama 3.3 and both had the same error. In short, this command ```...

> [@scriptator](https://github.com/scriptator) maybe you can try building the image that I have here [#2848](https://github.com/huggingface/text-generation-inference/pull/2848) and see if that works for you I can confirm that this change works for me....

> [@scriptator](https://github.com/scriptator) it is good to have a confirmation that this works! I'm still trying to figure out with the PR what is the best way to do without changing...

Minimal reproduction command: `podman run --device nvidia.com/gpu=0 --rm -it --entrypoint python ghcr.io/huggingface/text-generation-inference -c "from torch.utils._triton import triton_backend; triton_backend()"`