mofanke

Results 6 issues of mofanke

fix https://github.com/kubeflow/notebooks/issues/106

size/S
ok-to-test

Operating System: Windows GPU: NVIDIA with 6GB memory Description: While switching between Mistral 7B and Codellama 7B, I noticed a decrease in GPU available memory for layers offloaded to the...

bug

[1704891429] sampled token: 29896: '1' [1704891429] update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 256 [1704891429] update_slots : failed to find free...

bug

# Prerequisites Please answer the following questions for yourself before submitting an issue. - [x] I am running the latest code. Development is very rapid so there are no tagged...

enhancement

https://mistral.ai/news/codestral-mamba/ You can deploy Codestral Mamba using the [mistral-inference](https://github.com/mistralai/mistral-inference/releases/tag/v1.2.0) SDK, which relies on the reference implementations from Mamba’s GitHub repository. The model can also be deployed through [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/mamba). For local...

new model

What does the --max-length parameter actually determine when training a draft model? If I set --max-length 2048, does it mean that the maximum context length for the draft model (including...