Yun Dai

Results 27 comments of Yun Dai

Hi @romerojosh @maxhgerlach , could you possibly help take a look at this? Thank you so much in advance!

Hi @maxhgerlach thank you for taking a close look! I missed out a closing bracket during the most recent update. Thanks for catching this!

@romerojosh would you mind help taking another look at it to check if there's anything missed here? Thanks a lot! Also I noticed there're two `Build and Test GPU` tasks...

+1, hitting the same issue. It hangs when 1) I enable TensorBoard on rank 0 only, **and** 2) set `update_freq` to some number rather than the default `'epoch'`

Hi @chongxiaoc , Yes, either enabling tensorboard on all ranks or using per-epoch update frequency (i.e. don't pass the `update_freq` to TensorBoard() param, only the `log_dir`) would work. I was...

I tried removing them but the issue still persists. I think it's that rank 0 traces a different graph from other ranks, because as I modify the MNIST model architecture...

It would be super helpful if TF team could kindly help releasing spark-tensorflow-connector built with scala 2.12 to maven artifact 🙌 Right now the best workaround seems to be `spark-tfrecord`...

hi @davidberard98 friendly checking in a bit, do we by chance have any update on this issue? Thanks!

@S1ro1 one straightforward idea is to parallelize expert forward (just like what megablock impl does). Right now in HF model code the MoE block is performed sequentiallyexpert-by-expert. Not sure if...

LGTM, @llllvvuu as soon as the merge conflict is resolved we can get this in