LineaRE icon indicating copy to clipboard operation
LineaRE copied to clipboard

跑大的数据集内存会溢出 调batchsize也不行

Open dbbice opened this issue 4 years ago • 5 comments

2022-04-08 12:04:26,342 | INFO: ---------- Evaluating on Valid Dataset ---------- Traceback (most recent call last): File "C:/Users/wsco28/Desktop/linkprediction/LineaRE-master/new code/main.py", line 17, in tt.train() File "C:\Users\wsco28\Desktop\linkprediction\LineaRE-master\new code\traintest.py", line 46, in train metrics = LineaRE.test_step(self.__cal_model, self.__kg.test_data_iterator(test='valid'), True) File "C:\Users\wsco28\Desktop\linkprediction\LineaRE-master\new code\lineare.py", line 137, in test_step sort = model(pos_sample, filter_bias, mode) File "D:\anaconda\envs\py38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "C:\Users\wsco28\Desktop\linkprediction\LineaRE-master\new code\lineare.py", line 33, in forward return self._test(sample, w_or_fb, ht) File "C:\Users\wsco28\Desktop\linkprediction\LineaRE-master\new code\lineare.py", line 62, in _test score = wh * self.ent_embd.weight + (r - wt * t) RuntimeError: CUDA out of memory. Tried to allocate 10.30 GiB (GPU 0; 12.00 GiB total capacity; 747.23 MiB already allocated; 9.77 GiB free; 774.00 MiB reserved in total by PyTorch) 请问有遇到这个问题的吗

dbbice avatar Apr 08 '22 04:04 dbbice

实体数量太多了,调整 test_batch_size,设置小一点。 这是在测试过程中的显存溢出。测试 ranking 时会计算所有实体的 score,计算量为 test_batch_size * num_ents,由于实体数量太多,计算量太大,一次不能测试太多的 triple。由于 num_ents 不可变,所以你把 test_batch_size 调小试试。如果还不行,就只能改代码,把 num_ents 也切分,分批计算,算完再合起来 ranking。

pengyanhui avatar Apr 08 '22 06:04 pengyanhui

由于很多时候,调整batch_size,设置小一点。 这是在测试过程中显存的量。 ,一次测试所以需要三倍。由于 num_ents 不存在,你把 test_batch_size 调小不能做。如果还起来,就只能改代码,把 num_ents 也切分,分批计算,算再合排名。

谢谢您的回复,该问题已解决,我发现您的代码中有对关系链接模式进行评估---------- Evaluating on Symmetry Dataset ---------- 2022-03-31 20:00:54,760 | INFO: Evaluating the model... (0/136) 2022-03-31 20:01:10,378 | INFO: Symmetry MR at step 100000: 1.152906 2022-03-31 20:01:10,378 | INFO: Symmetry MRR at step 100000: 0.924659 2022-03-31 20:01:10,378 | INFO: Symmetry HITS@1 at step 100000: 0.850092 2022-03-31 20:01:10,378 | INFO: Symmetry HITS@3 at step 100000: 0.999539 2022-03-31 20:01:10,378 | INFO: Symmetry HITS@10 at step 100000: 1.000000 2022-03-31 20:01:10,379 | INFO: ---------- Evaluating on Inversion Dataset ---------- 2022-03-31 20:01:15,470 | INFO: Evaluating the model... (0/1298) 2022-03-31 20:01:51,582 | INFO: Evaluating the model... (500/1298) 2022-03-31 20:02:33,427 | INFO: Evaluating the model... (1000/1298) 2022-03-31 20:02:55,014 | INFO: Inversion MR at step 100000: 2.595345 2022-03-31 20:02:55,015 | INFO: Inversion MRR at step 100000: 0.955678 2022-03-31 20:02:55,015 | INFO: Inversion HITS@1 at step 100000: 0.934090 2022-03-31 20:02:55,015 | INFO: Inversion HITS@3 at step 100000: 0.973639 2022-03-31 20:02:55,015 | INFO: Inversion HITS@10 at step 100000: 0.989393 2022-03-31 20:02:55,015 | INFO: ---------- Evaluating on Composition Dataset ---------- 2022-03-31 20:03:00,054 | INFO: Evaluating the model... (0/414) 2022-03-31 20:03:35,678 | INFO: Composition MR at step 100000: 139.645111 2022-03-31 20:03:35,679 | INFO: Composition MRR at step 100000: 0.451203 2022-03-31 20:03:35,679 | INFO: Composition HITS@1 at step 100000: 0.360142 2022-03-31 20:03:35,679 | INFO: Composition HITS@3 at step 100000: 0.491867 2022-03-31 20:03:35,679 | INFO: Composition HITS@10 at step 100000: 0.622418 2022-03-31 20:03:35,679 | INFO: ---------- Evaluating on Other Dataset ---------- 2022-03-31 20:03:40,675 | INFO: Evaluating the model... (0/2) 2022-03-31 20:03:46,374 | INFO: Other MR at step 100000: 1225.594604 2022-03-31 20:03:46,375 | INFO: Other MRR at step 100000: 0.251871 2022-03-31 20:03:46,375 | INFO: Other HITS@1 at step 100000: 0.162162 2022-03-31 20:03:46,375 | INFO: Other HITS@3 at step 100000: 0.297297 2022-03-31 20:03:46,375 | INFO: Other HITS@10 at step 100000: 0.432432 但是只有FB15K提供了这些关系模式的测试集,其他数据集均只提供了训练集、验证集、测试集(我自己的数据集也是),由于我的数据集太大,训练完成后到 Evaluating on Symmetry Dataset 报错,我将traintest.py中的以下代码进行注释这样可以运行吗?由于跑的很慢,暂时还没训练完,所以想请教您一下

relation patterns

	test_datasets_str = ['Symmetry', 'Inversion', 'Composition', 'Other']
	for dataset_str in test_datasets_str:
		test_data_list = self.__kg.test_data_iterator(test=dataset_str)
		if len(test_datasets_str[0]) == 0:
			continue
		logging.info(f'---------- Evaluating on {dataset_str} Dataset ----------')
		metrics = LineaRE.test_step(self.__cal_model, test_data_list)
		self._log_metrics(dataset_str, step, metrics)

dbbice avatar Apr 09 '22 03:04 dbbice

由于很多时候,调整batch_size,设置小一点。 这是在测试过程中显存的量。 ,一次测试所以需要三倍。由于 num_ents 不存在,你把 test_batch_size 调小不能做。如果还起来,就只能改代码,把 num_ents 也切分,分批计算,算再合排名。

谢谢您的回复,该问题已解决,我发现您的代码中有对关系链接模式进行评估---------- Evaluating on Symmetry Dataset ---------- 2022-03-31 20:00:54,760 | INFO: Evaluating the model... (0/136) 2022-03-31 20:01:10,378 | INFO: Symmetry MR at step 100000: 1.152906 2022-03-31 20:01:10,378 | INFO: Symmetry MRR at step 100000: 0.924659 2022-03-31 20:01:10,378 | INFO: Symmetry HITS@1 at step 100000: 0.850092 2022-03-31 20:01:10,378 | INFO: Symmetry HITS@3 at step 100000: 0.999539 2022-03-31 20:01:10,378 | INFO: Symmetry HITS@10 at step 100000: 1.000000 2022-03-31 20:01:10,379 | INFO: ---------- Evaluating on Inversion Dataset ---------- 2022-03-31 20:01:15,470 | INFO: Evaluating the model... (0/1298) 2022-03-31 20:01:51,582 | INFO: Evaluating the model... (500/1298) 2022-03-31 20:02:33,427 | INFO: Evaluating the model... (1000/1298) 2022-03-31 20:02:55,014 | INFO: Inversion MR at step 100000: 2.595345 2022-03-31 20:02:55,015 | INFO: Inversion MRR at step 100000: 0.955678 2022-03-31 20:02:55,015 | INFO: Inversion HITS@1 at step 100000: 0.934090 2022-03-31 20:02:55,015 | INFO: Inversion HITS@3 at step 100000: 0.973639 2022-03-31 20:02:55,015 | INFO: Inversion HITS@10 at step 100000: 0.989393 2022-03-31 20:02:55,015 | INFO: ---------- Evaluating on Composition Dataset ---------- 2022-03-31 20:03:00,054 | INFO: Evaluating the model... (0/414) 2022-03-31 20:03:35,678 | INFO: Composition MR at step 100000: 139.645111 2022-03-31 20:03:35,679 | INFO: Composition MRR at step 100000: 0.451203 2022-03-31 20:03:35,679 | INFO: Composition HITS@1 at step 100000: 0.360142 2022-03-31 20:03:35,679 | INFO: Composition HITS@3 at step 100000: 0.491867 2022-03-31 20:03:35,679 | INFO: Composition HITS@10 at step 100000: 0.622418 2022-03-31 20:03:35,679 | INFO: ---------- Evaluating on Other Dataset ---------- 2022-03-31 20:03:40,675 | INFO: Evaluating the model... (0/2) 2022-03-31 20:03:46,374 | INFO: Other MR at step 100000: 1225.594604 2022-03-31 20:03:46,375 | INFO: Other MRR at step 100000: 0.251871 2022-03-31 20:03:46,375 | INFO: Other HITS@1 at step 100000: 0.162162 2022-03-31 20:03:46,375 | INFO: Other HITS@3 at step 100000: 0.297297 2022-03-31 20:03:46,375 | INFO: Other HITS@10 at step 100000: 0.432432 但是只有FB15K提供了这些关系模式的测试集,其他数据集均只提供了训练集、验证集、测试集(我自己的数据集也是),由于我的数据集太大,训练完成后到 Evaluating on Symmetry Dataset 报错,我将traintest.py中的以下代码进行注释这样可以运行吗?由于跑的很慢,暂时还没训练完,所以想请教您一下

relation patterns

	test_datasets_str = ['Symmetry', 'Inversion', 'Composition', 'Other']
	for dataset_str in test_datasets_str:
		test_data_list = self.__kg.test_data_iterator(test=dataset_str)
		if len(test_datasets_str[0]) == 0:
			continue
		logging.info(f'---------- Evaluating on {dataset_str} Dataset ----------')
		metrics = LineaRE.test_step(self.__cal_model, test_data_list)
		self._log_metrics(dataset_str, step, metrics)

应该是可以的。你还可以放几个空文件进去,就是三元组的数量为 0,测试时会跳过(if len(test_datasets_str[0]) == 0: continue)

pengyanhui avatar Apr 09 '22 03:04 pengyanhui

好的,万分感谢~~~

dbbice avatar Apr 09 '22 03:04 dbbice

你好,想问一下,由于只有FB15k有Sym,inv,composition这样的数据文件,我还想在其他的数据集也进行这样的测试,有公开的代码吗

stevenlyf428 avatar Jun 06 '22 03:06 stevenlyf428