Leonardo Perelli
Leonardo Perelli
Thank you for both the comments! Sorry, I didnt specify but the version is the latest, 0.11 By the way the strange thing is that the message is always "index...
@MaartenGr Thank you for your help. I am sharing the code, unfortunately I cannot share the data too. There is nothing special and I used this snippet countless times, however...
Hello, thanks for your answer. Indeed for sure the first call was fit without problems and there are no apparent problems with the clusters, in the sense that no edge...
Hey! I interpret linear projection as applying a matrix, while I think the transformation Ax+b to be called affine transformation. Anyways, guess it's just a tiny detail of the implementation!
Hi all. Wondering why does QLora only work for GPU, bot not for CPUs? @artidoro Thanks!
> Problem with `max_tokens` less than `n_ctx`. I think we need to add an assert to ensure context bigger than generated text size. `max_tokens` is 1024 while `n_ctx` is 200...
Did you try increasing the max output tokens (in this test, try setting it to eg 20k or so, just to be sure). Does this solve the issue? As long...
Hey @xhchen10 , did you have any luck finding out the reason? I spotted that too and seems strange
Thanks a lot Jianyuan, now it makes sense! :)
Same issue, no logprobs returned. I was getting an error when using the __call__ method of LLama + logits_all = True. Had to switch to the create_chat_completion method, no more...