mattiasmar
mattiasmar
Hi guys, Any insights with regards to the cuda issue? Ping @JuergenUniVie
Hi, Did you find any solution to this problem?
@awaelchli The example of lightning-tutorials/lightning_examples/distributed-training/main.ipynb is very detailed and that's good. However, it doesn't show how a multi-node job should be executed (e.g. with mpirun). On K8S cluster, one would...
For the purpose of extending Glow to support LSTM structures (not unrolled); Could you provide some pointers? Design thoughts? Links to work in progress? Core elements that would need to...
So how you approach a model like [GNMT](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Translation/GNMT)? Unrolling the network wouldn't be very processing efficient. How could we best take benefit of Glow, without doing a major change to...
For reference I attach here a non-PyG example that follows the same parallel processing procedure that does work. [multi_processing_torch.txt](https://github.com/pyg-team/pytorch_geometric/files/9189809/multi_processing_torch.txt) (please change the suffix of the file from .txt to .py)
@rusty1s did you use pytorch cpu or gpu? (I use pytorch 1.12.0 py3.9_cpu_0 pytorch)
I added a few more log traces in the GCN forward implementation. Running this code shows that execution doesn't pass the first convultion. ``` def forward(self, x, edge_index, edge_weight=None): print("dropout1",flush=True)...
Another hint is that if I run the code in "Debug" (in VS code) some of the GCN forward calls do execute to the end (but not all).
Yes. The line where the code gets stuck is: ``` edge_attr = torch.cat([edge_attr[mask], loop_attr], dim=0) ``` of `torch_geometric/utils/loop.py` method `add_remaining_self_loops`