dpgen icon indicating copy to clipboard operation
dpgen copied to clipboard

RuntimeError: Job 634fdaf9-f361-4482-96bd-a525e834a435 failed for more than 3 times

Open anrushan opened this issue 4 years ago • 9 comments

When i run the dpgen,i meet a error:RuntimeError: Job 634fdaf9-f361-4482-96bd-a525e834a435 failed for more than 3 times The dpgen.log : 2021-07-23 18:29:49,654 - INFO : start running 2021-07-23 18:30:02,702 - INFO : start running 2021-07-23 18:30:48,067 - INFO : start running 2021-07-23 18:31:02,104 - INFO : start running 2021-07-23 18:33:50,590 - INFO : start running 2021-07-24 10:00:13,923 - INFO : start running 2021-07-24 17:19:03,753 - INFO : start running 2021-07-24 17:20:19,307 - INFO : start running 2021-07-24 17:20:19,308 - INFO : =============================iter.000000============================== 2021-07-24 17:20:19,308 - INFO : -------------------------iter.000000 task 00-------------------------- 2021-07-24 17:20:45,372 - INFO : start running 2021-07-24 17:20:45,372 - INFO : =============================iter.000000============================== 2021-07-24 17:20:45,372 - INFO : -------------------------iter.000000 task 00-------------------------- 2021-07-24 17:34:05,879 - INFO : start running 2021-07-24 17:34:05,879 - INFO : =============================iter.000000============================== 2021-07-24 17:34:05,879 - INFO : -------------------------iter.000000 task 00-------------------------- 2021-07-24 17:45:32,958 - INFO : start running 2021-07-24 17:45:32,959 - INFO : =============================iter.000000============================== 2021-07-24 17:45:32,959 - INFO : -------------------------iter.000000 task 00-------------------------- 2021-07-24 17:46:55,439 - INFO : start running 2021-07-24 17:46:55,440 - INFO : =============================iter.000000============================== 2021-07-24 17:46:55,440 - INFO : -------------------------iter.000000 task 00-------------------------- 2021-07-24 17:47:47,988 - INFO : start running 2021-07-24 17:47:47,989 - INFO : =============================iter.000000============================== 2021-07-24 17:47:47,989 - INFO : -------------------------iter.000000 task 00-------------------------- 2021-07-24 17:48:44,760 - INFO : start running 2021-07-24 17:48:44,760 - INFO : =============================iter.000000============================== 2021-07-24 17:48:44,760 - INFO : -------------------------iter.000000 task 00-------------------------- 2021-07-24 17:52:36,699 - INFO : start running 2021-07-24 17:52:36,700 - INFO : =============================iter.000000============================== 2021-07-24 17:52:36,700 - INFO : -------------------------iter.000000 task 00-------------------------- 2021-07-24 17:54:57,753 - INFO : start running 2021-07-24 17:54:57,754 - INFO : =============================iter.000000============================== 2021-07-24 17:54:57,754 - INFO : -------------------------iter.000000 task 00-------------------------- 2021-07-24 22:00:51,771 - INFO : start running 2021-07-24 22:01:35,038 - INFO : start running 2021-07-24 22:01:55,319 - INFO : start running 2021-07-24 22:01:55,321 - INFO : =============================iter.000000============================== 2021-07-24 22:01:55,321 - INFO : -------------------------iter.000000 task 00-------------------------- 2021-07-24 22:01:55,340 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-07-24 22:01:55,360 - INFO : new submission of 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730 2021-07-24 22:01:55,377 - INFO : new submission of f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72 2021-07-24 22:01:55,401 - INFO : new submission of b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c 2021-07-24 22:01:55,415 - INFO : new submission of 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333 2021-07-24 22:02:56,008 - INFO : job 634fdaf9-f361-4482-96bd-a525e834a435 terminated, submit again 2021-07-24 22:02:56,220 - INFO : job f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 terminated, submit again 2021-07-24 22:02:56,424 - INFO : job b524808f-4e9d-4773-a60a-8031c827fc40 terminated, submit again 2021-07-24 22:02:56,629 - INFO : job 43f9a4bc-9999-4ac4-8dcb-56352480a58a terminated, submit again 2021-07-24 22:03:56,909 - INFO : job 634fdaf9-f361-4482-96bd-a525e834a435 terminated, submit again 2021-07-24 22:03:57,141 - INFO : job f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 terminated, submit again 2021-07-24 22:03:57,312 - INFO : job b524808f-4e9d-4773-a60a-8031c827fc40 terminated, submit again 2021-07-24 22:03:57,603 - INFO : job 43f9a4bc-9999-4ac4-8dcb-56352480a58a terminated, submit again 2021-07-24 22:04:57,875 - INFO : job 634fdaf9-f361-4482-96bd-a525e834a435 terminated, submit again 2021-07-24 22:04:58,089 - INFO : job f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 terminated, submit again 2021-07-24 22:04:58,298 - INFO : job b524808f-4e9d-4773-a60a-8031c827fc40 terminated, submit again 2021-07-24 22:04:58,438 - INFO : job 43f9a4bc-9999-4ac4-8dcb-56352480a58a terminated, submit again 2021-07-26 15:42:52,266 - INFO : start running 2021-07-26 15:42:52,267 - INFO : continue from iter 000 task 00 2021-07-26 15:42:52,267 - INFO : =============================iter.000000============================== 2021-07-26 15:42:52,267 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-07-26 15:42:52,395 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730 2021-07-26 15:42:52,471 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72 2021-07-26 15:42:52,557 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c 2021-07-26 15:42:52,681 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333 2021-07-26 15:44:40,477 - INFO : start running 2021-07-26 15:44:40,478 - INFO : continue from iter 000 task 00 2021-07-26 15:44:40,478 - INFO : =============================iter.000000============================== 2021-07-26 15:44:40,478 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-07-26 15:44:40,603 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730 2021-07-26 15:44:40,728 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72 2021-07-26 15:44:40,843 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c 2021-07-26 15:44:40,980 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333 2021-07-26 15:47:37,001 - INFO : start running 2021-07-26 15:47:37,001 - INFO : continue from iter 000 task 00 2021-07-26 15:47:37,001 - INFO : =============================iter.000000============================== 2021-07-26 15:47:37,001 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-07-26 15:47:37,111 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730 2021-07-26 15:47:37,221 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72 2021-07-26 15:47:37,346 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c 2021-07-26 15:47:37,445 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333 2021-08-26 15:19:47,191 - INFO : start running 2021-08-26 15:21:24,977 - INFO : start running 2021-08-26 15:22:22,593 - INFO : start running 2021-08-26 15:22:49,760 - INFO : start running 2021-08-26 15:23:56,044 - INFO : start running 2021-08-26 15:24:13,861 - INFO : start running 2021-08-26 15:24:13,862 - INFO : continue from iter 000 task 00 2021-08-26 15:24:13,862 - INFO : =============================iter.000000============================== 2021-08-26 15:24:13,862 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 15:24:13,980 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730 2021-08-26 15:24:14,067 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72 2021-08-26 15:24:14,174 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c 2021-08-26 15:24:14,303 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333 2021-08-26 15:26:32,980 - INFO : start running 2021-08-26 15:26:32,980 - INFO : continue from iter 000 task 00 2021-08-26 15:26:32,980 - INFO : =============================iter.000000============================== 2021-08-26 15:26:32,980 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 15:27:56,565 - INFO : start running 2021-08-26 15:27:56,566 - INFO : continue from iter 000 task 00 2021-08-26 15:27:56,566 - INFO : =============================iter.000000============================== 2021-08-26 15:27:56,566 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 15:35:21,767 - INFO : start running 2021-08-26 15:35:21,767 - INFO : continue from iter 000 task 00 2021-08-26 15:35:21,767 - INFO : =============================iter.000000============================== 2021-08-26 15:35:21,767 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 15:52:25,332 - INFO : start running 2021-08-26 15:52:25,333 - INFO : continue from iter 000 task 00 2021-08-26 15:52:25,333 - INFO : =============================iter.000000============================== 2021-08-26 15:52:25,333 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 15:52:25,334 - INFO : cannot find key "batch" in machine file, try to use deprecated key "machine_type" 2021-08-26 15:52:58,877 - INFO : start running 2021-08-26 15:52:58,877 - INFO : continue from iter 000 task 00 2021-08-26 15:52:58,877 - INFO : =============================iter.000000============================== 2021-08-26 15:52:58,877 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 15:52:58,878 - INFO : cannot find key "batch" in machine file, try to use deprecated key "machine_type" 2021-08-26 15:53:53,236 - INFO : start running 2021-08-26 15:53:53,237 - INFO : continue from iter 000 task 00 2021-08-26 15:53:53,237 - INFO : =============================iter.000000============================== 2021-08-26 15:53:53,237 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 15:53:53,359 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730 2021-08-26 15:53:53,447 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72 2021-08-26 15:53:53,580 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c 2021-08-26 15:53:53,688 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333 2021-08-26 15:55:55,213 - INFO : start running 2021-08-26 15:55:55,214 - INFO : continue from iter 000 task 00 2021-08-26 15:55:55,214 - INFO : =============================iter.000000============================== 2021-08-26 15:55:55,214 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 15:58:15,563 - INFO : start running 2021-08-26 15:58:15,564 - INFO : continue from iter 000 task 00 2021-08-26 15:58:15,564 - INFO : =============================iter.000000============================== 2021-08-26 15:58:15,564 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 16:09:54,493 - INFO : start running 2021-08-26 16:10:23,189 - INFO : start running 2021-08-26 16:11:30,986 - INFO : start running 2021-08-26 16:12:13,422 - INFO : start running 2021-08-26 16:12:37,819 - INFO : start running 2021-08-26 16:12:43,941 - INFO : start running 2021-08-26 16:12:43,941 - INFO : continue from iter 000 task 00 2021-08-26 16:12:43,941 - INFO : =============================iter.000000============================== 2021-08-26 16:12:43,941 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 16:15:20,562 - INFO : start running 2021-08-26 16:15:20,563 - INFO : continue from iter 000 task 00 2021-08-26 16:15:20,563 - INFO : =============================iter.000000============================== 2021-08-26 16:15:20,563 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 16:15:31,998 - INFO : start running 2021-08-26 16:15:31,998 - INFO : continue from iter 000 task 00 2021-08-26 16:15:31,998 - INFO : =============================iter.000000============================== 2021-08-26 16:15:31,998 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 16:18:01,501 - INFO : start running 2021-08-26 16:18:01,502 - INFO : continue from iter 000 task 00 2021-08-26 16:18:01,502 - INFO : =============================iter.000000============================== 2021-08-26 16:18:01,502 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 16:18:34,938 - INFO : start running 2021-08-26 16:18:51,220 - INFO : start running 2021-08-26 16:18:51,220 - INFO : continue from iter 000 task 00 2021-08-26 16:18:51,221 - INFO : =============================iter.000000============================== 2021-08-26 16:18:51,221 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 16:19:37,939 - INFO : start running 2021-08-26 16:19:37,940 - INFO : continue from iter 000 task 00 2021-08-26 16:19:37,940 - INFO : =============================iter.000000============================== 2021-08-26 16:19:37,941 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 16:19:38,037 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730 2021-08-26 16:19:38,122 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72 2021-08-26 16:19:38,264 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c 2021-08-26 16:19:38,388 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333 2021-08-26 16:30:27,385 - INFO : start running 2021-08-26 16:30:27,386 - INFO : continue from iter 000 task 00 2021-08-26 16:30:27,386 - INFO : =============================iter.000000============================== 2021-08-26 16:30:27,386 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 16:30:27,479 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730 2021-08-26 16:30:27,603 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72 2021-08-26 16:30:27,739 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c 2021-08-26 16:30:27,872 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333 2021-08-26 16:33:27,441 - INFO : start running 2021-08-26 16:33:27,441 - INFO : continue from iter 000 task 00 2021-08-26 16:33:27,441 - INFO : =============================iter.000000============================== 2021-08-26 16:33:27,442 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 16:33:27,528 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730 2021-08-26 16:33:27,606 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72 2021-08-26 16:33:27,737 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c 2021-08-26 16:33:27,817 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333 2021-08-26 16:37:12,955 - INFO : start running 2021-08-26 16:37:12,955 - INFO : continue from iter 000 task 00 2021-08-26 16:37:12,955 - INFO : =============================iter.000000============================== 2021-08-26 16:37:12,955 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 16:40:04,885 - INFO : start running 2021-08-26 16:40:04,886 - INFO : continue from iter 000 task 00 2021-08-26 16:40:04,886 - INFO : =============================iter.000000============================== 2021-08-26 16:40:04,886 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 16:40:05,016 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730 2021-08-26 16:40:05,139 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72 2021-08-26 16:40:05,244 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c 2021-08-26 16:40:05,330 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333 2021-08-26 17:16:30,001 - INFO : start running 2021-08-26 17:16:30,002 - INFO : continue from iter 000 task 00 2021-08-26 17:16:30,002 - INFO : =============================iter.000000============================== 2021-08-26 17:16:30,002 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 17:16:30,088 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730 2021-08-26 17:16:30,169 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72 2021-08-26 17:16:30,290 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c 2021-08-26 17:16:30,375 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333 2021-08-26 17:20:23,491 - INFO : start running 2021-08-26 17:20:23,491 - INFO : continue from iter 000 task 00 2021-08-26 17:20:23,491 - INFO : =============================iter.000000============================== 2021-08-26 17:20:23,491 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 17:20:23,591 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730 2021-08-26 17:20:23,690 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72 2021-08-26 17:20:23,771 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c 2021-08-26 17:20:23,858 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333 2021-08-26 17:23:07,873 - INFO : start running 2021-08-26 17:23:07,873 - INFO : continue from iter 000 task 00 2021-08-26 17:23:07,873 - INFO : =============================iter.000000============================== 2021-08-26 17:23:07,874 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 17:23:07,992 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730 2021-08-26 17:23:08,125 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72 2021-08-26 17:23:08,266 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c 2021-08-26 17:23:08,406 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333 2021-08-26 17:33:49,911 - INFO : start running 2021-08-26 17:33:49,912 - INFO : continue from iter 000 task 00 2021-08-26 17:33:49,912 - INFO : =============================iter.000000============================== 2021-08-26 17:33:49,912 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 17:33:50,008 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730 2021-08-26 17:33:50,088 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72 2021-08-26 17:33:50,257 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c 2021-08-26 17:33:50,372 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333 2021-08-26 17:34:11,435 - INFO : start running 2021-08-26 17:34:11,436 - INFO : continue from iter 000 task 00 2021-08-26 17:34:11,436 - INFO : =============================iter.000000============================== 2021-08-26 17:34:11,436 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 17:34:11,518 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730 2021-08-26 17:34:11,621 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72 2021-08-26 17:34:11,714 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c 2021-08-26 17:34:11,872 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333

The machine.json is: { "_comment": "training on localhost ", "_comment" : "This is for DeePMD-kit 1.*", "train_command" : "/home/jiangjun/miniconda3/bin/dp", "train_machine": { "batch": "shell", "work_path" : "/home/jiangjun/cptwokile/1/c1/data" }, "train_resources": { "envs": { } },

"_comment":		"model_devi on localhost ",
"lmp_command":	"/usr/bin/lmp_mpi",
"model_devi_group_size": 5,
"model_devi_machine":	{
"batch":	"shell",
"_comment" : "If lazy_local is true, calculations are done directly in current folders.",
"lazy_local" : true
},	
"model_devi_resources":	{
},    

"_comment":		"fp on localhost ",
"fp_command":	"/home/jiangjun/cptwokile/cp2k-8.1/exe/local/cp2k.ssmp",
"fp_group_size":	2,
"fp_machine":	{
"batch":	"shell",
"work_path" :	"/home/jiangjun/cptwokile/1/c1/data",
"_comment" :	"that's all"
},	
"fp_resources":	{
"module_list":  ["mpi"],
"task_per_node":4,
"with_mpi":	true,
"_comment":	"that's all"
},

"_comment":		" that's all "

} If i need do something for batch type shell?My mechine is Local single server without lsf and so on .

anrushan avatar Aug 26 '21 09:08 anrushan

Maybe error will be print to train.log...I guess... I'm facing similar error and trying to debug... qaq

zhao-w-en avatar Aug 27 '21 10:08 zhao-w-en

It seems happen during model training, please check train.log for specific error infos in your work path /home/jiangjun/cptwokile/1/c1/data/

Ericwang6 avatar Aug 28 '21 03:08 Ericwang6

It seems happen during model training, please check train.log for specific error infos in your work path /home/jiangjun/cptwokile/1/c1/data/ Thanks for your reply,I check the train.log ,the error is Traceback (most recent call last): File "/home/jiangjun/miniconda3/bin/dp", line 10, in sys.exit(main()) File "/home/jiangjun/miniconda3/lib/python3.8/site-packages/deepmd/main.py", line 73, in main train(args) File "/home/jiangjun/miniconda3/lib/python3.8/site-packages/deepmd/train.py", line 87, in train _do_work(jdata, run_opt) File "/home/jiangjun/miniconda3/lib/python3.8/site-packages/deepmd/train.py", line 140, in _do_work model.build (data, stop_batch) File "/home/jiangjun/miniconda3/lib/python3.8/site-packages/deepmd/Trainer.py", line 215, in build assert (self.ntypes >= data.get_ntypes()), "ntypes should match that found in data" AssertionError: ntypes should match that found in data I don't understand the meaning of this prompt. I hope I can get your help,And thanks for your reply again.

anrushan avatar Aug 28 '21 08:08 anrushan

Make sure that the number of atom types in your training data (you can check it in type_map.raw) is smaller than the length of type_map parameter in the param.json.

Ericwang6 avatar Aug 28 '21 08:08 Ericwang6

Make sure that the number of atom types in your training data (you can check it in type_map.raw) is smaller than the length of type_map parameter in the param.json.

The sel is not match with the types,add the number of each type the Thanks for your reply!But,i have met a new error :2021-08-28 21:29:19,591 - INFO : job 634fdaf9-f361-4482-96bd-a525e834a435 finished 2021-08-28 21:29:19,597 - INFO : job f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 finished 2021-08-28 21:29:19,602 - INFO : job b524808f-4e9d-4773-a60a-8031c827fc40 finished 2021-08-28 21:29:19,607 - INFO : job 43f9a4bc-9999-4ac4-8dcb-56352480a58a finished 2021-08-28 21:29:19,613 - INFO : -------------------------iter.000000 task 02-------------------------- 2021-08-28 21:29:19,614 - INFO : -------------------------iter.000000 task 03-------------------------- 2021-08-28 21:29:19,648 - INFO : -------------------------iter.000000 task 04-------------------------- 2021-08-28 21:29:19,649 - INFO : Dispatcher switches to the lazy local mode 2021-08-28 21:29:19,666 - INFO : new submission of 4c01fc60-c9b8-444e-8d5a-635d3f592fac for chunk 2a332aeea901b380d1755cc375d3fcb7e993ecae 2021-08-28 21:30:19,944 - INFO : job 4c01fc60-c9b8-444e-8d5a-635d3f592fac terminated, submit again 2021-08-28 21:31:20,209 - INFO : job 4c01fc60-c9b8-444e-8d5a-635d3f592fac terminated, submit again 2021-08-28 21:32:20,459 - INFO : job 4c01fc60-c9b8-444e-8d5a-635d3f592fac terminated, submit again 2021-08-28 21:34:15,982 - INFO : start running 2021-08-28 21:34:15,983 - INFO : continue from iter 000 task 03 2021-08-28 21:34:15,983 - INFO : =============================iter.000000============================== 2021-08-28 21:34:15,983 - INFO : -------------------------iter.000000 task 04-------------------------- 2021-08-28 21:34:15,984 - INFO : Dispatcher switches to the lazy local mode 2021-08-28 21:34:16,075 - INFO : restart from old submission 4c01fc60-c9b8-444e-8d5a-635d3f592fac for chunk 2a332aeea901b380d1755cc375d3fcb7e993ecae

May I ask you how to solve this problem,thanks for your reply!

anrushan avatar Aug 28 '21 13:08 anrushan

ERROR: Unknown pair style deepmd (../force.cpp:262) Last command: pair_style deepmd ../graph.000.pb ../graph.001.pb ../graph.002.pb ../graph.003.pb out_freq ${THERMO_FREQ} out_file model_devi.out ERROR: Unknown pair style deepmd (../force.cpp:262) appear in the lammps log

anrushan avatar Aug 28 '21 13:08 anrushan

ERROR: Unknown pair style deepmd (../force.cpp:262) Last command: pair_style deepmd ../graph.000.pb ../graph.001.pb ../graph.002.pb ../graph.003.pb out_freq ${THERMO_FREQ} out_file model_devi.out ERROR: Unknown pair style deepmd (../force.cpp:262) appear in the lammps log

Can you make sure that you installed correct version of Lammps compatible with DeePMD-kit ?

AnguseZhang avatar Aug 31 '21 08:08 AnguseZhang

I am facing the same problem, I checked that the type_map.raw file and the type_map in paramter.json file. But the problem was not solved. In addtion, I checked the train.log file in work path, and it shows the resule below. /var/spool/torque/mom_priv/jobs/134083.admin.SC: line 15: /public/home/liyue/deepmd-kit/bin/dp: Argument list too long /var/spool/torque/mom_priv/jobs/134083.admin.SC: line 29: /public/home/liyue/deepmd-kit/bin/dp: Argument list too long

ChesterCs-thu avatar Oct 26 '21 08:10 ChesterCs-thu

I am facing the same problem, I checked that the type_map.raw file and the type_map in paramter.json file. But the problem was not solved. In addtion, I checked the train.log file in work path, and it shows the resule below. /var/spool/torque/mom_priv/jobs/134083.admin.SC: line 15: /public/home/liyue/deepmd-kit/bin/dp: Argument list too long /var/spool/torque/mom_priv/jobs/134083.admin.SC: line 29: /public/home/liyue/deepmd-kit/bin/dp: Argument list too long

Please provide your input files to help locate and reproduce the problem.

HuangJiameng avatar Jul 12 '22 08:07 HuangJiameng