Alfredo Solano comments

Results 7 comments of


                                            Alfredo Solano

Inference on multi-gpu

Is the model at `/home/ubuntu/model/` the 7B, 13B or 34B version? You may need to adjust the `--nproc_per_node` parameter to 1, 2 and 4 respectively. (It is stated [here](https://github.com/facebookresearch/codellama#inference) in...

Inference on multi-gpu

Good to hear! IIRC it is not a quick fix to change the model parallel configuration, as the code expects the exact name and number of layers indicated in the...

Inference on multi-gpu

I see. I'm afraid I am not familiar with that kind of setup, but there is already a HuggingFace version of Code Llama, so you may try running that instead...

tf_vae.json empty after running vae_train.py

Thanks for your suggestion, @leekwoon. I no longer have access to the DGX station but I tried the change in a AWS instance. For now it looks good, the vae.json...

tf_vae.json empty after running vae_train.py

FWIW, I did find that step 3 of the GPU jobs showed some problems with the patched code so after a bit of failed troubleshooting I just decided to go...

generate crashes with certain inputs

Hi, @Sengxian, thank you for your suggestion. Unfortunately we have tried with 256, 384 and 512 and the results are still the same(crash on the same spot). Any other hints...

generate crashes with certain inputs

Fair enough! We tried that and while it seems to work but it also increases the length of the generated sequence -most of it repetitions or garbage- to the point...