Stephen Roller

Results 41 issues of Stephen Roller

**Patch description** **Testing steps** **Other information**

CLA Signed

Please comment with docs you think should be updated, added, for clarified.

never-stale

# Bug description It takes an extremely long time to load multiwoz v22. With the data already downloaded, the train set takes >200 seconds to get to display_data on my...

Help Wanted
never-stale

In #3740, we added support for FullyShardedDataParallel, but limited implementation to that of Zero2, not Zero3. Zero3 results in substantial decreases of memory usage compared with Zero2 while bringing speed...

never-stale

Please comment or +1 list your flaky tests. Don't just say "gpu tests", name specifics failing

never-stale

We have quite a few instances where we have some per-token losses/metrics along with a corresponding mask ```python metric_per_token # torch.Tensor of shape (batchsize, num_tokens) mask # torch.BoolTensor of shape...

Enhancement
Help Wanted
Medium
never-stale

We have a newer model. Let's use it! Current tutorial here: https://colab.research.google.com/drive/1bRMvN0lGXaTF5fuTidgvlAl-Lb41F7AD#scrollTo=KtVz5dCUmFkN

Help Wanted
Small
never-stale

It contains too much copy pasta with the regular interactive web. We should find a way to improve this.

Code Quality
Medium
donotreap

**Description** We need to be able to see metrics (like F1, etc) for individual examples when available.

never-stale

It's pretty useful to know whether the initializer is being created and using extra memory. We should add a log saying whether it's being hit. It might help with #2942.

Help Wanted
Minor
never-stale