Tommy Mulc

Results 3 comments of Tommy Mulc

I'm having the same issue, but only when using CeleryExecutor. LocalExecutor seems to work.

Would it be easier to use multiple parameters servers and shard the variables? You will eventually reach a bandwidth limit with one machine so it might be worth implementing this...

I did a non sanity check training (asynchronously on 384 cores) with lr=.00015, norm=ins, loss=l1, min_len=10, max_len=100, and r=5. Additionally, I used Luong Attention, gradient clipping by norm of 5,...