fairydreaming comments

Results 99 comments of


                                            fairydreaming

Cannot run T5-based models

@kyteinsky lol, didn't notice that all that time, sorry

@Sadeghi85 One thing that is missing in your code is a preparation of the input to `decode()` call. Check how it's done in llama-cli source code: https://github.com/ggerganov/llama.cpp/blob/b841d0740855c5af1344a81f261139a45a2b39ee/examples/main/main.cpp#L536-L552 So before calling...

Cannot run T5-based models

I see there are serious problems with using T5, so I added a branch with a high-level example of inference with T5 model: https://github.com/fairydreaming/llama-cpp-python/tree/t5 Also there is a second branch...

Cannot run T5-based models

@yugaljain1999 Yes, you can pass multiple prompts. I don't know how it works in llama-cpp-python high-level API, but in llama.cpp (low-level API) you do it by creating a batch containing...

Cannot run T5-based models

@yugaljain1999 T5 models are still not supported in llama-server

Eval bug: getting assertion error when trying to use a gguf quantized model at inference "GGML_ASSERT(n_outputs_enc > 0 && "call llama_encode() first") failed"

@Vedapani0402 Some time ago I created these two llama-cpp-python branches with working high-level and low-level T5 examples: https://github.com/fairydreaming/llama-cpp-python/tree/t5 https://github.com/fairydreaming/llama-cpp-python/tree/fix-low-level-examples If you follow them everything should work just fine. Note that...

Eval bug: getting assertion error when trying to use a gguf quantized model at inference "GGML_ASSERT(n_outputs_enc > 0 && "call llama_encode() first") failed"

@Vedapani0402 You are right, in the meantime llama.cpp API changed so much that it was impossible for this to still work correctly. I updated low-level examples in [fix-low-level-examples](https://github.com/fairydreaming/llama-cpp-python/tree/fix-low-level-examples) branch to...

Eval bug: getting assertion error when trying to use a gguf quantized model at inference "GGML_ASSERT(n_outputs_enc > 0 && "call llama_encode() first") failed"

> hi [@fairydreaming](https://github.com/fairydreaming), Thank you for updating the code. > > But there are 2 issues I am currently facing: > > 1. The updated code is working as expected...

Eval bug: getting assertion error when trying to use a gguf quantized model at inference "GGML_ASSERT(n_outputs_enc > 0 && "call llama_encode() first") failed"

@Vedapani0402 As far as I remember I based the code on existing llama example, I have no idea if it's ready for handling multiple inputs. But maybe instead of freeing/recreating...

Eval bug: getting assertion error when trying to use a gguf quantized model at inference "GGML_ASSERT(n_outputs_enc > 0 && "call llama_encode() first") failed"

@Vedapani0402 Sorry, I have no experience with running small quantized models (usually I run them with f16), so can't help there.