Model Transfer Learning error
Hello @rociomer, @suneelbvs @divnori @kant thank you very much for this work!
In the model Transfer Learning task, I have completed the pre-processing of two datasets (set_1 and set_2) according to the tutorial, and pre-trained the model using set 1, but I got an error when I continued to use set 2 for transfer learning. The error message is as follows,I noticed that it is CUDA memory access error, what could be the reason?
-
Creating dataset directory /data/user/yanx/GraphINVENT/output_set_2/pretrain/
-
Creating model subdirectory /data/user/yanx/GraphINVENT/output_set_2/pretrain/job_0/
-
Running job as a normal process.
-
Checking that the relevant parameters match those used in preprocessing the dataset. -- Job parameters match preprocessing parameters.
-
Run mode: 'train' -- time elapsed: 0.00216 s
-
Loading training set properties.
-
Defining model. -- Defining scheduler.
-
Beginning training.
-
Training epoch 53
File "./graphinvent/main.py", line 76, in
result = self.forward(*input, **kwargs) File "/data/user/yanx/workdir/repo/GraphINVENT/graphinvent/gnn/summation_mpnn.py", line 147, in forward output = self.APDReadout(hidden_nodes, graph_embeddings) result = self.forward(*input, **kwargs) File "/data/user/yanx/workdir/repo/GraphINVENT/graphinvent/gnn/modules.py", line 250, in forward f_add_1 = self.fAddNet1(node_level_output) result = self.forward(*input, **kwargs) File "/data/user/yanx/workdir/repo/GraphINVENT/graphinvent/gnn/modules.py", line 170, in forward return self.seq(layers_input) File "/data/user/yanx/workdir/miniconda3/envs/GraphINVENT/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/data/user/yanx/workdir/miniconda3/envs/GraphINVENT/lib/python3.8/site-packages/torch/nn/modules/container.py", line 119, in forward input = module(input) File "/data/user/yanx/workdir/miniconda3/envs/GraphINVENT/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/data/user/yanx/workdir/miniconda3/envs/GraphINVENT/lib/python3.8/site-packages/torch/nn/modules/activation.py", line 566, in forward return F.selu(input, self.inplace) File "/data/user/yanx/workdir/miniconda3/envs/GraphINVENT/lib/python3.8/site-packages/torch/nn/functional.py", line 1323, in selu result = torch.selu(input) RuntimeError: CUDA error: an illegal memory access was encountered Traceback (most recent call last): File "submit-pre-training.py", line 219, in submit() File "submit-pre-training.py", line 145, in submit subprocess.run([f"{PYTHON_PATH}", File "/data/user/yanx/Dev/workdir/miniconda3/envs/GraphINVENT/lib/python3.8/subprocess.py", line 516, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['/data/user/yanx/workdir/miniconda3/envs/GraphINVENT/bin/python', './graphinvent/main.py', '--job-dir', '/data/user/yanx/GraphINVENT/output_set_2/pretrain_set_2/job_0/']' returned non-zero exit status 1.
Thank you very much for your help and I look forward to your reply! Best Regards,
@yx0516 Kudos for your work!