Kaustabh Ganguly
Kaustabh Ganguly
I am also having a problem in this error . You found any solution ?
Again facing same damn issue
How to solve this
`from transformers import TextIteratorStreamer from threading import Thread from unsloth import FastLanguageModel # 1) Prepare model for efficient generation FastLanguageModel.for_inference(model) # 2) Use a more directive prompt and include BOS...
`# Block 15: Optional Inference Test print("\n--- Block 13: Running Basic Inference Test ---") from transformers import TextIteratorStreamer from threading import Thread from unsloth import FastLanguageModel if training_successful and 'final_model'...
This is not a bug, unsloth runs on only 1 gpu.. It wont run in multi gpu systems. force it to use only 1 gpu
You have to use a smaller max seq length. I trained successfully on H100 80gb vram, with 8000 context window (instruction + response) and it was consuming 69gb vram
the size vram consumes increases quadratic with the sequence length for the attention layers
@shimmyshimmer hey can you just answer 1 question please - What will be the best open source model below 3b size that is most suitable for medical or clinical domain...