Baizhou Huang

Results 12 comments of Baizhou Huang

I just check the details of `spawn()` function. For example, we have 2 hosts with 2 processes running on each host. Then the `local_rank = dist.get_rank()` will return 0, 1,...

Sorry, but I still don't understand why `q_sample()` returns `x_start` when `t=0`. https://github.com/Shark-NLP/DiffuSeq/blob/bdc8f0adbff22e88c8530d1f20c3c7589c061d40/diffuseq/gaussian_diffusion.py#L612 The param `x_start` here equals to `x_start_mean + self.sqrt_one_minus_alphas_cumprod[0] * noise` (according to `self._get_x_start()`). So it stands...

Sorry, but I cannot deduce your conclusion from your code. > The model learns the mse loss between α0x0+β0∗noise and Emb(wx)+β0∗noise Please give me a hint based on some code...

Same issue with single node training. And adding more time has no effect.

@hongpeng-guo Does that mean I cannot simultaneously run two verl programs on the same node / ray cluster?

Thanks for the quick response. Is it possible launch two verl jobs in one cluster with different namespaces?

> As named actors are global unique in each cluster, launching two jobs in the same cluster is an undefined behavior. I'm not very familiar with Ray. Is the reason...

> why do you need to run two verl jobs in a single cluster? It seems like launching two clusters in one node is not recommended :) Thanks anyway! I...