speed
speed
I also encountered a loss discrepancy when resuming from checkpoint after setting the vocab size to be indivisible by the world size when I use FSDP. Since the loss quickly...
Hi @wwwjn, thank you for your support! I’ve prepared a reproduction script based on the [latest main branch](https://github.com/pytorch/torchtitan/commit/a44dff1a41f6c0d8e504919ce4b1b50d05102f01), along with some instructions. Here is the [code](https://github.com/speed1313/torchtitan/commit/08fb43479eedc5016383cd4db628b9d38465a25d#diff-78cc79291e219fdfd73f7b1c7b5d442d1346f821b8add32e0e02d62597fe0ee5). I ran it with...