gladjoyhub
gladjoyhub
> Was the model actually training? Can you add the log here? Where or how to get the log file for LORA training?
I got this from LoRA training: Iter 1: Val loss 2.929, Val took 1494.344s libc++abi: terminating due to uncaught exception of type std::runtime_error: [malloc_or_wait] Unable to allocate 4352 bytes. But...
> What model are you training? It looks like you ran out of memory... > > By the log, I just meant the output of the `python lora.py ...`. model...
> > But in train.jsonl, there is a line that's very long, like 2000 words > > Is that the default `train.jsonl` or a custom one? You should split those...
Now I am using the default train.jsonl, now speeds get around 200 Tokens/sec, which is good. but then at Iter 130, speed drops to 1.4 tokens/sec. Asitop says RAM Usage:...
No, didn't do anything else, GPU usage was under 50%. memory swap was probably dragging it down, but DRAM is plenty why should it do memory swap? One suspicion is...
@awni, I can't find gguf related python code in the current examples directory. I did find it here: https://github.com/ml-explore/mlx-examples/tree/9804e1155c961c864971a5db750030ebcc3db8f2/llms/gguf_llm But the models.py has pdb trace code in it, and reporting...
> What model are you using, @gladjoyhub ? Looks like it doesn't have `tokenizer.ggml.tokens` or `tokenizer.ggml.merges`, so the tokenizer cannot be initialized. I only tested this code with tinyllama. @jbochi...
integer primary key in various databases usually start with 1