stormchasingg
stormchasingg
The `global average pooling` is used in the discriminator, torch code and illustrated in the DCGAN paper. But here is `fully connection` in the discriminator.
Expectation: input a black horse, output a zebra and the head's orientation is different from that of the original zebra (training) My result: input a black horse, output a zebra...
**Is your feature request related to a problem? Please describe.** `ulysess sp + ring attention` gives a good performance in SFT/RL training, which is called `hierarchical CP` here. But it...
**Your question** In my case, megatron checkpoint blocked on async_save mode. The file existed on right place but no data: ``` -rw-r--r-- 1 root root 6305 Nov 28 13:47 __0_0.distcp...