seansong comments

Results 44 comments of


                                            seansong

FSDP finetuned model inference question

I also noticed that because of the all-reduce before the forward pass, it's not recommended to use FSDP for inference. Does this mean FSDP inference isn't supported so far by...

FSDP finetuned model inference question

@mreso Thank you for looking into this issue. In the meantime, is there a workaround for using finetuned FSDP checkpoints for inference? Thanks.

FSDP finetuned model inference question

@mreso Is there an update on this? Thanks

FSDP finetuned model inference question

Could we prioritize this? if the checkpoints don't work how can we use the fine-tuned FSDP checkpoint for inference?

FSDP finetuned model inference question

Hey, @mreso I found this only happens for llama3 and 3.1 models. inference with checkpoints from FSDP llama2 is ok. the arch of llama3 and llama2 are pretty similar. do...

FSDP finetuned model inference question

Is there a fix to this?

FSDP finetuned model inference question

Thank @wukaixingxp for fixing this. For some reason I got this issue `rocessing dataset: 0%| | 0/49402 [00:00

FSDP finetuned model inference question

@wukaixingxp I tried both of the meta-llama/Meta-Llama-3-8B-Instruct and meta-llama/Llama-3.1-8B-Instruct. both of them have the same issue as before. Here is my steps: ![image](https://github.com/user-attachments/assets/984a65c1-0871-424a-82f8-ee92454c63ab) ``` python ./src/llama_recipes/inference/checkpoint_converter_fsdp_hf.py --fsdp_checkpoint_path ./fsdp_fine_tune_results/fsdp_model_finetuned_1_8_8B/fine-tuned-Meta-Llama-3-8B-Instruct --consolidated_model_path ./fsdp_fine_tune_results/fsdp_model_finetuned_1_8_hf...

FSDP finetuned model inference question

@wukaixingxp Thanks for the updates. I can't find the `meta-llama/Meta-Llama-3.1-8B-Instruct` model card on hugging face. But there is `https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct`. I wonder are they the same model with different names?

FLOPs counter seems doesn't work

@wukaixingxp Thanks for the prompt reply. here is the command I used: I used slurm. > srun -l docker exec -w /root/ fsdp torchrun --nnodes 1 --nproc_per_node 8 --rdzv_id 1599...