Pete Tanski
Pete Tanski
@miekg and @zosocanuck please include me in the conversation. I may test this first in my fork, at [github.com/pdtgct/pkcs11](https://github.com/pdtgct/pkcs11)
Here is an example `generation_config.json`: ```json { "_from_model_config": true, "bos_token_id": 0, "eos_token_id": 0, "pad_token_id": 1, "max_new_tokens": 128, "min_new_tokens": 1, "penalty_alpha": null, "repetition_penalty": 1.0, "do_sample": true, "temperature": 0.6, "top_k": 50, "top_p":...
@miekg I would be happy to help work on updating the go version and also merge in your changes with mine here
I can confirm seeing this issue in `djl-inference:0.29.0-tensorrtllm0.11.0-cu124`. Steps to reproduce: Send a POST request with the `stop` parameter: ```json { "inputs": "user\nYou are rolling a 12-sided dice twice.\n\nQuestion: Can...
Thanks, @sindhuvahinis - will try to find some time to confirm.