bansky-cl
bansky-cl
i met this problem too. after run `DDP_main_conditional.sh` got the ckpt (xxx .th) , when i `run predict_downstream_condition.py`, it occurs error the same. ``` RuntimeError: Error(s) in loading state_dict for...
same problem with `Detected kernel version 5.4.267, which is below the recommended minimum of 5.5.0; this can cause the process to hang.`
> @bansky-cl can you pls elaborate a bit more on the issue? does using single GPU FT recipe work for you? https://github.com/meta-llama/llama-recipes/blob/main/recipes/finetuning/singlegpu_finetuning.md#how-to-run-it Tks, I change my gpu device make it...
same problem to me, i solve by check my device and torch cuda version.
> when I use the original loss(without loss mask),I get the following result > > ``` > ----------------------------- > | decoder_nll | 1.27e-05 | > | decoder_nll_q0 | 1.68e-05 |...