weiddeng
weiddeng
Hi ashkan-leo, can you share your command? Would like to see our diffs. Out of memory errors is better than the one I got. Thanks! ________________________________ From: Ashkan ***@***.***> Sent:...
The mistake I made was using `--LlamaDecoderLayer 'LLaMADecoderLayer'` should use `--fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer'` However I did get OutOfMemoryError. With `--nproc_per_node=1` torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 774.00 MiB (GPU...
Hi @felri in your instruction fine tune data, I see each instruction is prefixed by "In Stardew Valley, " What is the rationale? Have you tried without the prefix? Thanks!
"Since the special_token_map.json is auto-generated" - interesting, it was not generated in my case. But you are right on that I used this leaked version of llama-13b :sweat_smile:
Same issue experienced.
gradient_accumulation_steps = batch_size // micro_batch_size