RuntimeError: Job 634fdaf9-f361-4482-96bd-a525e834a435 failed for more than 3 times
When i run the dpgen,i meet a error:RuntimeError: Job 634fdaf9-f361-4482-96bd-a525e834a435 failed for more than 3 times The dpgen.log : 2021-07-23 18:29:49,654 - INFO : start running 2021-07-23 18:30:02,702 - INFO : start running 2021-07-23 18:30:48,067 - INFO : start running 2021-07-23 18:31:02,104 - INFO : start running 2021-07-23 18:33:50,590 - INFO : start running 2021-07-24 10:00:13,923 - INFO : start running 2021-07-24 17:19:03,753 - INFO : start running 2021-07-24 17:20:19,307 - INFO : start running 2021-07-24 17:20:19,308 - INFO : =============================iter.000000============================== 2021-07-24 17:20:19,308 - INFO : -------------------------iter.000000 task 00-------------------------- 2021-07-24 17:20:45,372 - INFO : start running 2021-07-24 17:20:45,372 - INFO : =============================iter.000000============================== 2021-07-24 17:20:45,372 - INFO : -------------------------iter.000000 task 00-------------------------- 2021-07-24 17:34:05,879 - INFO : start running 2021-07-24 17:34:05,879 - INFO : =============================iter.000000============================== 2021-07-24 17:34:05,879 - INFO : -------------------------iter.000000 task 00-------------------------- 2021-07-24 17:45:32,958 - INFO : start running 2021-07-24 17:45:32,959 - INFO : =============================iter.000000============================== 2021-07-24 17:45:32,959 - INFO : -------------------------iter.000000 task 00-------------------------- 2021-07-24 17:46:55,439 - INFO : start running 2021-07-24 17:46:55,440 - INFO : =============================iter.000000============================== 2021-07-24 17:46:55,440 - INFO : -------------------------iter.000000 task 00-------------------------- 2021-07-24 17:47:47,988 - INFO : start running 2021-07-24 17:47:47,989 - INFO : =============================iter.000000============================== 2021-07-24 17:47:47,989 - INFO : -------------------------iter.000000 task 00-------------------------- 2021-07-24 17:48:44,760 - INFO : start running 2021-07-24 17:48:44,760 - INFO : =============================iter.000000============================== 2021-07-24 17:48:44,760 - INFO : -------------------------iter.000000 task 00-------------------------- 2021-07-24 17:52:36,699 - INFO : start running 2021-07-24 17:52:36,700 - INFO : =============================iter.000000============================== 2021-07-24 17:52:36,700 - INFO : -------------------------iter.000000 task 00-------------------------- 2021-07-24 17:54:57,753 - INFO : start running 2021-07-24 17:54:57,754 - INFO : =============================iter.000000============================== 2021-07-24 17:54:57,754 - INFO : -------------------------iter.000000 task 00-------------------------- 2021-07-24 22:00:51,771 - INFO : start running 2021-07-24 22:01:35,038 - INFO : start running 2021-07-24 22:01:55,319 - INFO : start running 2021-07-24 22:01:55,321 - INFO : =============================iter.000000============================== 2021-07-24 22:01:55,321 - INFO : -------------------------iter.000000 task 00-------------------------- 2021-07-24 22:01:55,340 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-07-24 22:01:55,360 - INFO : new submission of 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730 2021-07-24 22:01:55,377 - INFO : new submission of f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72 2021-07-24 22:01:55,401 - INFO : new submission of b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c 2021-07-24 22:01:55,415 - INFO : new submission of 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333 2021-07-24 22:02:56,008 - INFO : job 634fdaf9-f361-4482-96bd-a525e834a435 terminated, submit again 2021-07-24 22:02:56,220 - INFO : job f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 terminated, submit again 2021-07-24 22:02:56,424 - INFO : job b524808f-4e9d-4773-a60a-8031c827fc40 terminated, submit again 2021-07-24 22:02:56,629 - INFO : job 43f9a4bc-9999-4ac4-8dcb-56352480a58a terminated, submit again 2021-07-24 22:03:56,909 - INFO : job 634fdaf9-f361-4482-96bd-a525e834a435 terminated, submit again 2021-07-24 22:03:57,141 - INFO : job f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 terminated, submit again 2021-07-24 22:03:57,312 - INFO : job b524808f-4e9d-4773-a60a-8031c827fc40 terminated, submit again 2021-07-24 22:03:57,603 - INFO : job 43f9a4bc-9999-4ac4-8dcb-56352480a58a terminated, submit again 2021-07-24 22:04:57,875 - INFO : job 634fdaf9-f361-4482-96bd-a525e834a435 terminated, submit again 2021-07-24 22:04:58,089 - INFO : job f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 terminated, submit again 2021-07-24 22:04:58,298 - INFO : job b524808f-4e9d-4773-a60a-8031c827fc40 terminated, submit again 2021-07-24 22:04:58,438 - INFO : job 43f9a4bc-9999-4ac4-8dcb-56352480a58a terminated, submit again 2021-07-26 15:42:52,266 - INFO : start running 2021-07-26 15:42:52,267 - INFO : continue from iter 000 task 00 2021-07-26 15:42:52,267 - INFO : =============================iter.000000============================== 2021-07-26 15:42:52,267 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-07-26 15:42:52,395 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730 2021-07-26 15:42:52,471 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72 2021-07-26 15:42:52,557 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c 2021-07-26 15:42:52,681 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333 2021-07-26 15:44:40,477 - INFO : start running 2021-07-26 15:44:40,478 - INFO : continue from iter 000 task 00 2021-07-26 15:44:40,478 - INFO : =============================iter.000000============================== 2021-07-26 15:44:40,478 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-07-26 15:44:40,603 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730 2021-07-26 15:44:40,728 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72 2021-07-26 15:44:40,843 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c 2021-07-26 15:44:40,980 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333 2021-07-26 15:47:37,001 - INFO : start running 2021-07-26 15:47:37,001 - INFO : continue from iter 000 task 00 2021-07-26 15:47:37,001 - INFO : =============================iter.000000============================== 2021-07-26 15:47:37,001 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-07-26 15:47:37,111 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730 2021-07-26 15:47:37,221 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72 2021-07-26 15:47:37,346 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c 2021-07-26 15:47:37,445 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333 2021-08-26 15:19:47,191 - INFO : start running 2021-08-26 15:21:24,977 - INFO : start running 2021-08-26 15:22:22,593 - INFO : start running 2021-08-26 15:22:49,760 - INFO : start running 2021-08-26 15:23:56,044 - INFO : start running 2021-08-26 15:24:13,861 - INFO : start running 2021-08-26 15:24:13,862 - INFO : continue from iter 000 task 00 2021-08-26 15:24:13,862 - INFO : =============================iter.000000============================== 2021-08-26 15:24:13,862 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 15:24:13,980 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730 2021-08-26 15:24:14,067 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72 2021-08-26 15:24:14,174 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c 2021-08-26 15:24:14,303 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333 2021-08-26 15:26:32,980 - INFO : start running 2021-08-26 15:26:32,980 - INFO : continue from iter 000 task 00 2021-08-26 15:26:32,980 - INFO : =============================iter.000000============================== 2021-08-26 15:26:32,980 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 15:27:56,565 - INFO : start running 2021-08-26 15:27:56,566 - INFO : continue from iter 000 task 00 2021-08-26 15:27:56,566 - INFO : =============================iter.000000============================== 2021-08-26 15:27:56,566 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 15:35:21,767 - INFO : start running 2021-08-26 15:35:21,767 - INFO : continue from iter 000 task 00 2021-08-26 15:35:21,767 - INFO : =============================iter.000000============================== 2021-08-26 15:35:21,767 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 15:52:25,332 - INFO : start running 2021-08-26 15:52:25,333 - INFO : continue from iter 000 task 00 2021-08-26 15:52:25,333 - INFO : =============================iter.000000============================== 2021-08-26 15:52:25,333 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 15:52:25,334 - INFO : cannot find key "batch" in machine file, try to use deprecated key "machine_type" 2021-08-26 15:52:58,877 - INFO : start running 2021-08-26 15:52:58,877 - INFO : continue from iter 000 task 00 2021-08-26 15:52:58,877 - INFO : =============================iter.000000============================== 2021-08-26 15:52:58,877 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 15:52:58,878 - INFO : cannot find key "batch" in machine file, try to use deprecated key "machine_type" 2021-08-26 15:53:53,236 - INFO : start running 2021-08-26 15:53:53,237 - INFO : continue from iter 000 task 00 2021-08-26 15:53:53,237 - INFO : =============================iter.000000============================== 2021-08-26 15:53:53,237 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 15:53:53,359 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730 2021-08-26 15:53:53,447 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72 2021-08-26 15:53:53,580 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c 2021-08-26 15:53:53,688 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333 2021-08-26 15:55:55,213 - INFO : start running 2021-08-26 15:55:55,214 - INFO : continue from iter 000 task 00 2021-08-26 15:55:55,214 - INFO : =============================iter.000000============================== 2021-08-26 15:55:55,214 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 15:58:15,563 - INFO : start running 2021-08-26 15:58:15,564 - INFO : continue from iter 000 task 00 2021-08-26 15:58:15,564 - INFO : =============================iter.000000============================== 2021-08-26 15:58:15,564 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 16:09:54,493 - INFO : start running 2021-08-26 16:10:23,189 - INFO : start running 2021-08-26 16:11:30,986 - INFO : start running 2021-08-26 16:12:13,422 - INFO : start running 2021-08-26 16:12:37,819 - INFO : start running 2021-08-26 16:12:43,941 - INFO : start running 2021-08-26 16:12:43,941 - INFO : continue from iter 000 task 00 2021-08-26 16:12:43,941 - INFO : =============================iter.000000============================== 2021-08-26 16:12:43,941 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 16:15:20,562 - INFO : start running 2021-08-26 16:15:20,563 - INFO : continue from iter 000 task 00 2021-08-26 16:15:20,563 - INFO : =============================iter.000000============================== 2021-08-26 16:15:20,563 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 16:15:31,998 - INFO : start running 2021-08-26 16:15:31,998 - INFO : continue from iter 000 task 00 2021-08-26 16:15:31,998 - INFO : =============================iter.000000============================== 2021-08-26 16:15:31,998 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 16:18:01,501 - INFO : start running 2021-08-26 16:18:01,502 - INFO : continue from iter 000 task 00 2021-08-26 16:18:01,502 - INFO : =============================iter.000000============================== 2021-08-26 16:18:01,502 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 16:18:34,938 - INFO : start running 2021-08-26 16:18:51,220 - INFO : start running 2021-08-26 16:18:51,220 - INFO : continue from iter 000 task 00 2021-08-26 16:18:51,221 - INFO : =============================iter.000000============================== 2021-08-26 16:18:51,221 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 16:19:37,939 - INFO : start running 2021-08-26 16:19:37,940 - INFO : continue from iter 000 task 00 2021-08-26 16:19:37,940 - INFO : =============================iter.000000============================== 2021-08-26 16:19:37,941 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 16:19:38,037 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730 2021-08-26 16:19:38,122 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72 2021-08-26 16:19:38,264 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c 2021-08-26 16:19:38,388 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333 2021-08-26 16:30:27,385 - INFO : start running 2021-08-26 16:30:27,386 - INFO : continue from iter 000 task 00 2021-08-26 16:30:27,386 - INFO : =============================iter.000000============================== 2021-08-26 16:30:27,386 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 16:30:27,479 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730 2021-08-26 16:30:27,603 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72 2021-08-26 16:30:27,739 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c 2021-08-26 16:30:27,872 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333 2021-08-26 16:33:27,441 - INFO : start running 2021-08-26 16:33:27,441 - INFO : continue from iter 000 task 00 2021-08-26 16:33:27,441 - INFO : =============================iter.000000============================== 2021-08-26 16:33:27,442 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 16:33:27,528 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730 2021-08-26 16:33:27,606 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72 2021-08-26 16:33:27,737 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c 2021-08-26 16:33:27,817 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333 2021-08-26 16:37:12,955 - INFO : start running 2021-08-26 16:37:12,955 - INFO : continue from iter 000 task 00 2021-08-26 16:37:12,955 - INFO : =============================iter.000000============================== 2021-08-26 16:37:12,955 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 16:40:04,885 - INFO : start running 2021-08-26 16:40:04,886 - INFO : continue from iter 000 task 00 2021-08-26 16:40:04,886 - INFO : =============================iter.000000============================== 2021-08-26 16:40:04,886 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 16:40:05,016 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730 2021-08-26 16:40:05,139 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72 2021-08-26 16:40:05,244 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c 2021-08-26 16:40:05,330 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333 2021-08-26 17:16:30,001 - INFO : start running 2021-08-26 17:16:30,002 - INFO : continue from iter 000 task 00 2021-08-26 17:16:30,002 - INFO : =============================iter.000000============================== 2021-08-26 17:16:30,002 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 17:16:30,088 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730 2021-08-26 17:16:30,169 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72 2021-08-26 17:16:30,290 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c 2021-08-26 17:16:30,375 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333 2021-08-26 17:20:23,491 - INFO : start running 2021-08-26 17:20:23,491 - INFO : continue from iter 000 task 00 2021-08-26 17:20:23,491 - INFO : =============================iter.000000============================== 2021-08-26 17:20:23,491 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 17:20:23,591 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730 2021-08-26 17:20:23,690 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72 2021-08-26 17:20:23,771 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c 2021-08-26 17:20:23,858 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333 2021-08-26 17:23:07,873 - INFO : start running 2021-08-26 17:23:07,873 - INFO : continue from iter 000 task 00 2021-08-26 17:23:07,873 - INFO : =============================iter.000000============================== 2021-08-26 17:23:07,874 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 17:23:07,992 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730 2021-08-26 17:23:08,125 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72 2021-08-26 17:23:08,266 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c 2021-08-26 17:23:08,406 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333 2021-08-26 17:33:49,911 - INFO : start running 2021-08-26 17:33:49,912 - INFO : continue from iter 000 task 00 2021-08-26 17:33:49,912 - INFO : =============================iter.000000============================== 2021-08-26 17:33:49,912 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 17:33:50,008 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730 2021-08-26 17:33:50,088 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72 2021-08-26 17:33:50,257 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c 2021-08-26 17:33:50,372 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333 2021-08-26 17:34:11,435 - INFO : start running 2021-08-26 17:34:11,436 - INFO : continue from iter 000 task 00 2021-08-26 17:34:11,436 - INFO : =============================iter.000000============================== 2021-08-26 17:34:11,436 - INFO : -------------------------iter.000000 task 01-------------------------- 2021-08-26 17:34:11,518 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730 2021-08-26 17:34:11,621 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72 2021-08-26 17:34:11,714 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c 2021-08-26 17:34:11,872 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333
The machine.json is: { "_comment": "training on localhost ", "_comment" : "This is for DeePMD-kit 1.*", "train_command" : "/home/jiangjun/miniconda3/bin/dp", "train_machine": { "batch": "shell", "work_path" : "/home/jiangjun/cptwokile/1/c1/data" }, "train_resources": { "envs": { } },
"_comment": "model_devi on localhost ",
"lmp_command": "/usr/bin/lmp_mpi",
"model_devi_group_size": 5,
"model_devi_machine": {
"batch": "shell",
"_comment" : "If lazy_local is true, calculations are done directly in current folders.",
"lazy_local" : true
},
"model_devi_resources": {
},
"_comment": "fp on localhost ",
"fp_command": "/home/jiangjun/cptwokile/cp2k-8.1/exe/local/cp2k.ssmp",
"fp_group_size": 2,
"fp_machine": {
"batch": "shell",
"work_path" : "/home/jiangjun/cptwokile/1/c1/data",
"_comment" : "that's all"
},
"fp_resources": {
"module_list": ["mpi"],
"task_per_node":4,
"with_mpi": true,
"_comment": "that's all"
},
"_comment": " that's all "
} If i need do something for batch type shell?My mechine is Local single server without lsf and so on .
Maybe error will be print to train.log...I guess... I'm facing similar error and trying to debug... qaq
It seems happen during model training, please check train.log for specific error infos in your work path /home/jiangjun/cptwokile/1/c1/data/
It seems happen during model training, please check
train.logfor specific error infos in your work path/home/jiangjun/cptwokile/1/c1/data/Thanks for your reply,I check the train.log ,the error is Traceback (most recent call last): File "/home/jiangjun/miniconda3/bin/dp", line 10, insys.exit(main()) File "/home/jiangjun/miniconda3/lib/python3.8/site-packages/deepmd/main.py", line 73, in main train(args) File "/home/jiangjun/miniconda3/lib/python3.8/site-packages/deepmd/train.py", line 87, in train _do_work(jdata, run_opt) File "/home/jiangjun/miniconda3/lib/python3.8/site-packages/deepmd/train.py", line 140, in _do_work model.build (data, stop_batch) File "/home/jiangjun/miniconda3/lib/python3.8/site-packages/deepmd/Trainer.py", line 215, in build assert (self.ntypes >= data.get_ntypes()), "ntypes should match that found in data" AssertionError: ntypes should match that found in data I don't understand the meaning of this prompt. I hope I can get your help,And thanks for your reply again.
Make sure that the number of atom types in your training data (you can check it in type_map.raw) is smaller than the length of type_map parameter in the param.json.
Make sure that the number of atom types in your training data (you can check it in
type_map.raw) is smaller than the length oftype_mapparameter in theparam.json.
The sel is not match with the types,add the number of each type the Thanks for your reply!But,i have met a new error :2021-08-28 21:29:19,591 - INFO : job 634fdaf9-f361-4482-96bd-a525e834a435 finished 2021-08-28 21:29:19,597 - INFO : job f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 finished 2021-08-28 21:29:19,602 - INFO : job b524808f-4e9d-4773-a60a-8031c827fc40 finished 2021-08-28 21:29:19,607 - INFO : job 43f9a4bc-9999-4ac4-8dcb-56352480a58a finished 2021-08-28 21:29:19,613 - INFO : -------------------------iter.000000 task 02-------------------------- 2021-08-28 21:29:19,614 - INFO : -------------------------iter.000000 task 03-------------------------- 2021-08-28 21:29:19,648 - INFO : -------------------------iter.000000 task 04-------------------------- 2021-08-28 21:29:19,649 - INFO : Dispatcher switches to the lazy local mode 2021-08-28 21:29:19,666 - INFO : new submission of 4c01fc60-c9b8-444e-8d5a-635d3f592fac for chunk 2a332aeea901b380d1755cc375d3fcb7e993ecae 2021-08-28 21:30:19,944 - INFO : job 4c01fc60-c9b8-444e-8d5a-635d3f592fac terminated, submit again 2021-08-28 21:31:20,209 - INFO : job 4c01fc60-c9b8-444e-8d5a-635d3f592fac terminated, submit again 2021-08-28 21:32:20,459 - INFO : job 4c01fc60-c9b8-444e-8d5a-635d3f592fac terminated, submit again 2021-08-28 21:34:15,982 - INFO : start running 2021-08-28 21:34:15,983 - INFO : continue from iter 000 task 03 2021-08-28 21:34:15,983 - INFO : =============================iter.000000============================== 2021-08-28 21:34:15,983 - INFO : -------------------------iter.000000 task 04-------------------------- 2021-08-28 21:34:15,984 - INFO : Dispatcher switches to the lazy local mode 2021-08-28 21:34:16,075 - INFO : restart from old submission 4c01fc60-c9b8-444e-8d5a-635d3f592fac for chunk 2a332aeea901b380d1755cc375d3fcb7e993ecae
May I ask you how to solve this problem,thanks for your reply!
ERROR: Unknown pair style deepmd (../force.cpp:262) Last command: pair_style deepmd ../graph.000.pb ../graph.001.pb ../graph.002.pb ../graph.003.pb out_freq ${THERMO_FREQ} out_file model_devi.out ERROR: Unknown pair style deepmd (../force.cpp:262) appear in the lammps log
ERROR: Unknown pair style deepmd (../force.cpp:262) Last command: pair_style deepmd ../graph.000.pb ../graph.001.pb ../graph.002.pb ../graph.003.pb out_freq ${THERMO_FREQ} out_file model_devi.out ERROR: Unknown pair style deepmd (../force.cpp:262) appear in the lammps log
Can you make sure that you installed correct version of Lammps compatible with DeePMD-kit ?
I am facing the same problem, I checked that the type_map.raw file and the type_map in paramter.json file. But the problem was not solved. In addtion, I checked the train.log file in work path, and it shows the resule below. /var/spool/torque/mom_priv/jobs/134083.admin.SC: line 15: /public/home/liyue/deepmd-kit/bin/dp: Argument list too long /var/spool/torque/mom_priv/jobs/134083.admin.SC: line 29: /public/home/liyue/deepmd-kit/bin/dp: Argument list too long
I am facing the same problem, I checked that the type_map.raw file and the type_map in paramter.json file. But the problem was not solved. In addtion, I checked the train.log file in work path, and it shows the resule below. /var/spool/torque/mom_priv/jobs/134083.admin.SC: line 15: /public/home/liyue/deepmd-kit/bin/dp: Argument list too long /var/spool/torque/mom_priv/jobs/134083.admin.SC: line 29: /public/home/liyue/deepmd-kit/bin/dp: Argument list too long
Please provide your input files to help locate and reproduce the problem.