Stephen Roller
Stephen Roller
Please comment with docs you think should be updated, added, for clarified.
# Bug description It takes an extremely long time to load multiwoz v22. With the data already downloaded, the train set takes >200 seconds to get to display_data on my...
In #3740, we added support for FullyShardedDataParallel, but limited implementation to that of Zero2, not Zero3. Zero3 results in substantial decreases of memory usage compared with Zero2 while bringing speed...
Please comment or +1 list your flaky tests. Don't just say "gpu tests", name specifics failing
We have quite a few instances where we have some per-token losses/metrics along with a corresponding mask ```python metric_per_token # torch.Tensor of shape (batchsize, num_tokens) mask # torch.BoolTensor of shape...
We have a newer model. Let's use it! Current tutorial here: https://colab.research.google.com/drive/1bRMvN0lGXaTF5fuTidgvlAl-Lb41F7AD#scrollTo=KtVz5dCUmFkN
It contains too much copy pasta with the regular interactive web. We should find a way to improve this.
**Description** We need to be able to see metrics (like F1, etc) for individual examples when available.
It's pretty useful to know whether the initializer is being created and using extra memory. We should add a log saying whether it's being hit. It might help with #2942.