mofanke issues

Results 6 issues of


                                            mofanke

fix: unnecessary notebook updates caused by compare issues

fix https://github.com/kubeflow/notebooks/issues/106

size/S

ok-to-test

Switching back and forth between models will gradually reduce the available GPU memory.

Operating System: Windows GPU: NVIDIA with 6GB memory Description: While switching between Mistral 7B and Codellama 7B, I noticed a decrease in GPU available memory for layers offloaded to the...

bug

unexpected error in llama server update_slots - exiting main loop

[1704891429] sampled token: 29896: '1' [1704891429] update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 256 [1704891429] update_slots : failed to find free...

bug

Model Request for BAAI/bge-m3 (XLMRoberta-based Multilingual Embedding Model)

# Prerequisites Please answer the following questions for yourself before submitting an issue. - [x] I am running the latest code. Development is very rapid so there are no tagged...

enhancement

[model support] please support mamba-codestral-7B-v0.1

https://mistral.ai/news/codestral-mamba/ You can deploy Codestral Mamba using the [mistral-inference](https://github.com/mistralai/mistral-inference/releases/tag/v1.2.0) SDK, which relies on the reference implementations from Mamba’s GitHub repository. The model can also be deployed through [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/mamba). For local...

new model

question about --max-length parameter actually determine when training a draft model

What does the --max-length parameter actually determine when training a draft model? If I set --max-length 2048, does it mean that the maximum context length for the draft model (including...