Alfredo Solano
Alfredo Solano
Is the model at `/home/ubuntu/model/` the 7B, 13B or 34B version? You may need to adjust the `--nproc_per_node` parameter to 1, 2 and 4 respectively. (It is stated [here](https://github.com/facebookresearch/codellama#inference) in...
Good to hear! IIRC it is not a quick fix to change the model parallel configuration, as the code expects the exact name and number of layers indicated in the...
I see. I'm afraid I am not familiar with that kind of setup, but there is already a HuggingFace version of Code Llama, so you may try running that instead...
Thanks for your suggestion, @leekwoon. I no longer have access to the DGX station but I tried the change in a AWS instance. For now it looks good, the vae.json...
FWIW, I did find that step 3 of the GPU jobs showed some problems with the patched code so after a bit of failed troubleshooting I just decided to go...
Hi, @Sengxian, thank you for your suggestion. Unfortunately we have tried with 256, 384 and 512 and the results are still the same(crash on the same spot). Any other hints...
Fair enough! We tried that and while it seems to work but it also increases the length of the generated sequence -most of it repetitions or garbage- to the point...