Martin Simonovsky
Martin Simonovsky
Wow, what a great effort! Let's wait for the result of your jobs and if it's good, I can merge & clean up everything.
Damn, that's really frustrating. I guess it must be some bug in the kernels demonstrating itself only under some rare condition of the input data. Could you perhaps run the...
Thanks for your report. Is the crash reproducible at your side, meaning that if you rerun the training from scratch (the first command line above), will it break during episode...
> I launched two experiments and always crash in the iteration 164 of the epoch 7. Great news! Could you please pickle `(inputs, targets, GIs, PIs)` in https://github.com/mys007/ecc/blob/8fbc9019ca8bc2a620617d7477ac5d17ba65e4bf/main.py#L136 and make...
Hi, thanks a lot... but when I load the batch on my computer (from either of your files) so that each training iteration runs on it, I get no crash...
Hi, I was wondering: if you're in a very experimental mood, could you try to run https://github.com/mys007/ecc/tree/crazy_fix with pytorch 0.3? There is just one extra line which touches `dest`. I...
Damn, but thanks a lot. Well, actually, there has been one other user who has contacted me per email with the same issue in the meantime (though on Sydney; CUDA...
@HenrryBryant Thanks for your report and thanks for the effort of trying out the experimental branch. I'm sorry that the problem has not been solved. Although the new error message...
Well, what I meant is that "CUDA out of memory" might not be a bug but rather indeed running out of memory. Is the GPU completely free before running the...
@HenrryBryant Thanks for the investigations. That's an interesting note with DataLoader but I believe this is just a random workaround causing the timing of kernel runs being changed and more...