Bus error(core dumped) in EdgeDataLoader of unsupervised graphsage example (20M edges)
🐛 Bug
To Reproduce
Steps to reproduce the behavior:
- In the example code: https://github.com/dmlc/dgl/blob/master/examples/pytorch/graphsage/advanced/train_sampling_unsupervised.py
- Feed a dataset with 10M nodes and 20M edges
- Bus error(core dumped) happened in EdgeDataLoader before training and DataLoader in inference.
Expected behavior
Environment
- DGL Version (e.g., 1.0): 0.8.1cu101
- Backend Library & Version (e.g., PyTorch 0.4.1, MXNet/Gluon 1.3): PyTorch 1.8.1+cu101
- OS (e.g., Linux): Docker
- How you installed DGL (
conda,pip, source): pip - Build command you used (if compiling from source):
- Python version: 3.6.13
- CUDA/cuDNN version (if applicable): 101
- GPU models and configuration (e.g. V100): V100
- Any other relevant information:
Additional context
The num of workers is set to 0 in EdgeDataLoader
The bug didn't show when the number of train_seeds in EdgeDataLoader is small, like 1M
How many GPUs did you use? Have you changed the args such as --graph-device, --data-device?
How many GPUs did you use? Have you changed the args such as
--graph-device,--data-device?
one GPU. Nope. Have tried different args but got the same result.
Can you try adding --shm-size=64g (large enough to store the whole graph) to your docker run command?
I can not change the docker command. I set the num of workers to 0 so I think it doesn't use shared memory.
Can you try adding
--shm-size=64g(large enough to store the whole graph) to yourdocker runcommand?
The docker environment is created automatically. I am not able to change the shm size.
This line of code in dataloader will create a shared memory array for shuffling.
https://github.com/dmlc/dgl/blob/5ba5106acab6a642e9b790e5331ee519112a5623/python/dgl/dataloading/dataloader.py#L146-L149
When len(train_seeds) > 8M, the shared tensor will run out of Docker's default shm size 64MB.
@jermainewang @BarclayII
Can we avoid that when users do not intend to use shared memory?
According to the comments, the shared tensor is used for persistent_workers=True (or num_workers > 0 I think?). We can change the code to use shared tensors only when these conditions hold. @BarclayII may know more details.
We can change the code to use shared tensors only when these conditions hold.
That sounds reasonable. I'm not sure how PyTorch shares the tensor to forked subprocesses though: if PyTorch uses shared memory then we are technically still copying the ID tensor into shared memory implicitly.