Johannes Vass
Johannes Vass
The pull request #266 fixes the problem (in the sense that the server can successfully load the new model again).
Unfortunately that's all I currently have. If the problem occurs again I can perhaps gather more logs and tell more precisely what my last actions were.
This PR solves my issue descibed [here](https://github.com/huggingface/text-generation-inference/issues/2838#issuecomment-2586691348), thank you!
I just noticed that you changed the implementation between me building the image and my previous message. Should I test again?
Do you have a way of reproducing the error more quickly than building the entire image from scratch every time? And do you manage to get to the exact compiler...
I get the same errors with the latest docker image. So far I tested Mixtral 8x7B and llama 3.3 and both had the same error. In short, this command ```...
> [@scriptator](https://github.com/scriptator) maybe you can try building the image that I have here [#2848](https://github.com/huggingface/text-generation-inference/pull/2848) and see if that works for you I can confirm that this change works for me....
> [@scriptator](https://github.com/scriptator) it is good to have a confirmation that this works! I'm still trying to figure out with the PR what is the best way to do without changing...
Minimal reproduction command: `podman run --device nvidia.com/gpu=0 --rm -it --entrypoint python ghcr.io/huggingface/text-generation-inference -c "from torch.utils._triton import triton_backend; triton_backend()"`