RFBNet icon indicating copy to clipboard operation
RFBNet copied to clipboard

ConnectionResetError

Open vickersmith opened this issue 7 years ago • 3 comments

The following error occured in the process of traing,but the training continued working automatically later. I used 1 gpu and 4 num_workers.Anyone knows the reason of the error? ConnectionResetError: [Errno 104] Connection reset by peer

vickersmith avatar Nov 29 '18 02:11 vickersmith

same error. num_workers=1 batch_size = 16 ngpu = 1

Exception ignored in: <bound method _DataLoaderIter.del of <torch.utils.data.dataloader._DataLoaderIter object at 0x7feb4a51aba8>> Traceback (most recent call last): File "/home/ubuntu/XXX/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 349, in del self._shutdown_workers() File "/home/ubuntu/XXX/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 328, in _shutdown_workers self.worker_result_queue.get() File "/usr/local/lib/python3.6/multiprocessing/queues.py", line 345, in get return _ForkingPickler.loads(res) File "/home/ubuntu/XXX/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 70, in rebuild_storage_fd fd = df.detach() File "/usr/local/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach with _resource_sharer.get_connection(self._id) as conn: File "/usr/local/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection c = Client(address, authkey=process.current_process().authkey) File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 494, in Client deliver_challenge(c, authkey) File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 722, in deliver_challenge response = connection.recv_bytes(256) # reject large message File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes buf = self._recv_bytes(maxlength) File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes buf = self._recv(4) File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 379, in _recv chunk = read(handle, remaining) ConnectionResetError: [Errno 104] Connection reset by peer

haochange avatar Dec 12 '18 05:12 haochange

你们解决了吗,我也遇见了同样的问题,不过程序还能继续跑 @vickersmith @haochange @

zhaowujie avatar Dec 18 '18 02:12 zhaowujie

your program have some bug somewhere, Check carefully

wangkangnian avatar Dec 26 '18 02:12 wangkangnian