Andrew
Andrew
@yl4579 I believe we've done that already, oh well. If anyone else has or can test on an H100, we'd love to hear what's going on. This is a big...
@yl4579 Yes the output is considerably worse, I shared some examples in the main post. We have not tested if the module itself causes the bottleneck I don't think? cc...
Hmm anyone else been able get past this stage in training yet? Should be straight forward to replicate with that minimal dataset, to us right now it doesn't look like...