dgl icon indicating copy to clipboard operation
dgl copied to clipboard

TypeError: self.chandle cannot be converted to a Python object for pickling

Open hulihan-start opened this issue 1 year ago • 0 comments

🐛 Bug

When I using 'forkserver' method to run multiprocessing programs on Python, and I want to send the edgesampler object to my subprocess, it returns TypeError: self.chandle cannot be converted to a Python object for pickling, I want to know if there is something I can do to solve this problem? I have tried to put the edgesampler method into each subprocess but

To Reproduce

Steps to reproduce the behavior:

  1. download DGL-KE and checkout to 0.2.0 branch.
  2. add mp.set_start_method('forkserver') in the beginning of code.
  3. modify some functions in score_func.py and general_models.py.
  4. python train.py --model_name ComplEx --dataset FB15k --batch_size 10000 --neg_sample_size 1000 --neg_deg_sample --hidden_dim 100 --lr 0.1 --regularization_coef 0 --batch_size_eval 1000 --mix_cpu_gpu --gpu 0 1 2 3 --max_step 10 --neg_sample_size_eval 2000 --neg_deg_sample_eval --log_interval 1 --no_save_emb --no_eval_filter --eval_interval 2 --test --valid

Expected behavior

When I put the edgesampler to each subprocess, it returns a bad test/validation result. If I use the default multiprocessing start method (fork), it can return a good test/validation result.

Environment

  • DGL Version: 0.4.3
  • Backend Library & Version (e.g., PyTorch 0.4.1, MXNet/Gluon 1.3): Pytorch 2.0.1
  • OS (e.g., Linux): Unbuntu 22.04
  • How you installed DGL (conda, pip, source): pip install dgl==0.4.3
  • Build command you used (if compiling from source): no
  • Python version: 3.8.10
  • CUDA/cuDNN version (if applicable): 12.2
  • GPU models and configuration (e.g. V100): RTX 3090
  • Any other relevant information:

hulihan-start avatar May 13 '24 16:05 hulihan-start