cduk

Results 11 issues of cduk

### Describe the bug Starting with the prompt "The Eiffel Tower is " continues "324 meters high and weighs 7,300 tons. It was built in 1889 for the Universal Exhibition...

bug

I'm trying to get AIChat to work with a shortcut so that after typing in a chat message and pressing the hotkey in insert mode it would run AIChat. So...

Instead of running an instance per model in the dockerfile. Can a list of models be provided at instantiation and then the model is chosen via the api request. The...

Given that we have only Llama 3 70B and 8B, it would be useful to have a Tiny Llama based on the Llama 3 tokenizer so that we can use...

### Your current environment Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.4 LTS...

bug

There are many other parameters that are not passed through. Maybe arbitrary options can be passed through. Most important are stop tokens, early stopping, repetition etc. SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0,...

With Qwen3 you can dynamically control whether thinking is used or not. In some cases from the CLI, I want thinking to happen but take only the final output to...

enhancement

### Feature request Is there a way of receiving the embeddings back in BQ format? Right now, I receive the full precision embedding and quantize it in the client, but...

The enterprise version has some additional features for auto-repair. Can you include more details on this in the wiki including whether this involves any changes to the on-disk format (if...

### Feature request Currently compile options allow specifying compute cap down to 75. Can Pascal generation cc 6.0/6.1 also be supported? ``` Dockerfile-cuda:50 -------------------- 49 | 50 | >>> RUN...