TypeError: self.chandle cannot be converted to a Python object for pickling

Open hulihan-start opened this issue 1 year ago • 0 comments

🐛 Bug

When I using 'forkserver' method to run multiprocessing programs on Python, and I want to send the edgesampler object to my subprocess, it returns TypeError: self.chandle cannot be converted to a Python object for pickling, I want to know if there is something I can do to solve this problem? I have tried to put the edgesampler method into each subprocess but

To Reproduce

Steps to reproduce the behavior:

download DGL-KE and checkout to 0.2.0 branch.
add mp.set_start_method('forkserver') in the beginning of code.
modify some functions in score_func.py and general_models.py.
python train.py --model_name ComplEx --dataset FB15k --batch_size 10000 --neg_sample_size 1000 --neg_deg_sample --hidden_dim 100 --lr 0.1 --regularization_coef 0 --batch_size_eval 1000 --mix_cpu_gpu --gpu 0 1 2 3 --max_step 10 --neg_sample_size_eval 2000 --neg_deg_sample_eval --log_interval 1 --no_save_emb --no_eval_filter --eval_interval 2 --test --valid

Expected behavior

When I put the edgesampler to each subprocess, it returns a bad test/validation result. If I use the default multiprocessing start method (fork), it can return a good test/validation result.

Environment

DGL Version: 0.4.3
Backend Library & Version (e.g., PyTorch 0.4.1, MXNet/Gluon 1.3): Pytorch 2.0.1
OS (e.g., Linux): Unbuntu 22.04
How you installed DGL (conda, pip, source): pip install dgl==0.4.3
Build command you used (if compiling from source): no
Python version: 3.8.10
CUDA/cuDNN version (if applicable): 12.2
GPU models and configuration (e.g. V100): RTX 3090
Any other relevant information:

May 13 '24 16:05 hulihan-start