Andrew

Results 3 comments of Andrew

@yl4579 I believe we've done that already, oh well. If anyone else has or can test on an H100, we'd love to hear what's going on. This is a big...

@yl4579 Yes the output is considerably worse, I shared some examples in the main post. We have not tested if the module itself causes the bottleneck I don't think? cc...

Hmm anyone else been able get past this stage in training yet? Should be straight forward to replicate with that minimal dataset, to us right now it doesn't look like...