John DeNero
John DeNero
I think `seq2seq` training is not using multiple GPUs. The `tokens/sec` metric is the same as when I was training on a VM with only 1 GPU or 4 GPUs....
It would be great to shard minibatches across multiple GPUs to speed up training. Any pointers for how I get started? I'm just reading the dynet docs now.
The grade export job for CS 61A this semester looks like it's going to take about an hour to finish. That's fine, but we could probably make it faster one...
When students receive a "you are not enrolled" message, they should also be presented with a form in which they can request to be enrolled. They would fill in their...
Tasks remaining: - [ ] Replace .ok files with some minimal YAML - [ ] Get AI help and follow-up questions working again - [ ] Run pytest-grader