noob-ctrl
noob-ctrl
The version information is as follows: - Deepspeed. 0.8.1 - transformers. 4.26.1 ## Problem When I use Trainer with Deepspeed, the Number of trainable parameters is 0. Like this: ...
If I want to use the new feature of Pytorch2.0——torch.compile, what should I do? Where should I put the following code or just pass a command line parameter? ``` model...
关于两次前向传播
您好,我看代码中好像只进行了一次前向传播,那么x和正样例采用的dropout不是一样的吗?那二者的输出结果不就一样了吗?
Does Megatron-Core supports LLAMA models?
I I try to set the fp16 parameter to True and False respectively, why does the training time become longer when it is set to True?
I fllow the next step: - run docker build . -t megablocks-dev - and then bash docker.sh to launch the container. When I run `moe_46m_8gpu.sh` to test, it reported the...