Freddy Snijder
Freddy Snijder
Did you by any chance use e..g NVIDIA APEX for float16 training, and compiled this in the past with an older CUDA lib installed? I had the same import error...
I can confirm that downgrading from Python 3.7.4 to Python 3.6.10 mitigates this issue. Why does this happen with Python 3.7?
^ I'm very interested for this to be solved, I have other issues when using Python 3.6, and 3.6 is getting a bit old anyway ;-)
I am using the latest code. I'm currently also digging in to this, and it might be related to the fact that I'm working in a `virtualenv`. Does it work...
@taoleicn As a follow up to my previous comment, I was able to resolve this by adding the path that contains `Python.h` to the `CPATH` environment variable, before running my...
I also have the exact same error. @chestm007 Any idea what could be the problem here? Thanks.
**Update** I have been able to confirm that training my seq2seq model, based on GRUs, using DeepSpeed with Nvidia APEX AMP does work. In this experiment I used settings: ```...
@briansp2020 Was this ever resolved? I have the same issues, same AMD MI100, same error code -12,512GB RAM. But, it did work before. I changed it to another PCIe slot,...
Can it be related memory encryption somehow?
Hi @briansp2020, sorry to hear that. In my case, and this is hot from the press, after trying since yesterday, I got the GPU back to work again! Searching for...