ngoyal2707
ngoyal2707
Great work on gshard, curious if there are any plans to open-source it and if so, timeline?
I am using the library to post process road segmentation, which is just pixel-level binary classification problem, to classify each pixel as 'road' or 'no road'. My code looks as...
It seems that in the current version of `master` in the `he_init` function, `gain` is being passed for argument `a`. But as per `Pytorch code` in both version `1.0` and...
this branch is what I am using to train latest model with following changes: I will not merge this and will piece out separate PRs for each of these following:...
Calculation on page 65 of palm paper: https://arxiv.org/pdf/2204.02311.pdf
Most likely because we loop over bunch of tensors (gradients / activations norms etc) and move them to cpu for logging. Weirdly this happens outside of WPS and UPS counters,...