ZHANG Bowen

Results 21 comments of ZHANG Bowen

Sorry for the confusion, we should've added a short help message to this argument. DA refers to Distribution Alignment, which is a technique proposed in the ReMixMatch paper. > To...

Hi, did this happen before the training even started or during the training? If it's the former, make sure the distributed arguments are correctly set (i.e. `word-size`, `rank`, `dist-url`), in...

@Ajaypatel1234 According to your description, did you set rank=1 on both machines (nodes)? In your case, you should set rank=0 on the first node and rank=1 on the second.

I doubt whether they'll open-source it at all 👎

9 months passed and I still see this error.

Unfortunately, I don't think I have slurm installed on our cluster nor do I have a root privilege to configure it. Are there any other startup methods e.g. using torchrun...

Thank you for the reply. It's very nice of you! I'll try again tomorrow. I thought there should be +override.*** when the argument already exists in the yaml, and without...

Clear to me now. Thanks again for the clarification.👍 Will try out distributed training again tmr hopefully it will work. On Wed, Feb 16, 2022, 00:56 chevalierNoir ***@***.***> wrote: >...

Really frustrating, I've been working on this for a whole day and I just couldn't make it right. :-< Here is what I do (I wrote the port number 12356...