Open3D-PointNet2-Semantic3D icon indicating copy to clipboard operation
Open3D-PointNet2-Semantic3D copied to clipboard

[Problem] Training step

Open williamlai3a opened this issue 7 years ago • 4 comments

Hi,

I followed the instruction to run the training:python train.py using the default settings max_epoch=500 At the end of epoch 499, there is error popping up:

max_epoch 500
**** EPOCH 499 ****
2019-01-22 17:23:39.480862
Progress: [##########] 100%mean loss: 0.062824
Overall accuracy : 0.993542
Average IoU : 0.966070
IoU of man-made terrain : 0.978290
IoU of natural terrain : 0.991271
IoU of high vegetation : 0.995123
IoU of low vegetation : 0.932481
IoU of buildings : 0.994296
IoU of hard scape : 0.950104
IoU of scanning artifact : 0.926501
IoU of cars : 0.960493
(tf) william@william-Ubuntu:/media/william/E/Open3D-PointNet2-Semantic3D$ Process ForkPoolWorker-1:1:
Traceback (most recent call last):
  File "/media/william/E/anaconda3/envs/tf/lib/python3.6/multiprocessing/pool.py", line 125, in worker
    put((job, i, result))
  File "/media/william/E/anaconda3/envs/tf/lib/python3.6/multiprocessing/queues.py", line 347, in put
    self._writer.send_bytes(obj)
  File "/media/william/E/anaconda3/envs/tf/lib/python3.6/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/media/william/E/anaconda3/envs/tf/lib/python3.6/multiprocessing/connection.py", line 397, in _send_bytes
    self._send(header)
  File "/media/william/E/anaconda3/envs/tf/lib/python3.6/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/media/william/E/anaconda3/envs/tf/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/media/william/E/anaconda3/envs/tf/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/media/william/E/anaconda3/envs/tf/lib/python3.6/multiprocessing/pool.py", line 130, in worker
    put((job, i, (False, wrapped)))
  File "/media/william/E/anaconda3/envs/tf/lib/python3.6/multiprocessing/queues.py", line 347, in put
    self._writer.send_bytes(obj)
  File "/media/william/E/anaconda3/envs/tf/lib/python3.6/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/media/william/E/anaconda3/envs/tf/lib/python3.6/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header + buf)
  File "/media/william/E/anaconda3/envs/tf/lib/python3.6/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Process ForkPoolWorker-1:5:
Traceback (most recent call last):
  File "/media/william/E/anaconda3/envs/tf/lib/python3.6/multiprocessing/pool.py", line 125, in worker
    put((job, i, result))
  File "/media/william/E/anaconda3/envs/tf/lib/python3.6/multiprocessing/queues.py", line 347, in put
    self._writer.send_bytes(obj)
  File "/media/william/E/anaconda3/envs/tf/lib/python3.6/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/media/william/E/anaconda3/envs/tf/lib/python3.6/multiprocessing/connection.py", line 397, in _send_bytes
    self._send(header)
  File "/media/william/E/anaconda3/envs/tf/lib/python3.6/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/media/william/E/anaconda3/envs/tf/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/media/william/E/anaconda3/envs/tf/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/media/william/E/anaconda3/envs/tf/lib/python3.6/multiprocessing/pool.py", line 130, in worker
    put((job, i, (False, wrapped)))
  File "/media/william/E/anaconda3/envs/tf/lib/python3.6/multiprocessing/queues.py", line 347, in put
    self._writer.send_bytes(obj)
  File "/media/william/E/anaconda3/envs/tf/lib/python3.6/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/media/william/E/anaconda3/envs/tf/lib/python3.6/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header + buf)
  File "/media/william/E/anaconda3/envs/tf/lib/python3.6/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe

Would anyone please advise on what might go wrong? Thanks. William

williamlai3a avatar Jan 22 '19 09:01 williamlai3a

Looks like the multiprocessing Pool or Queue worker processes for dataset pre-fetching are not properly terminated at the end of the training. Luckily, this won't affect the training results, as it only happens at the end when the training is done. Will need to fix by properly terminating the worker processes.

yxlao avatar Jan 24 '19 08:01 yxlao

Looks like the multiprocessing Pool or Queue worker processes for dataset pre-fetching are not properly terminated at the end of the training. Luckily, this won't affect the training results, as it only happens at the end when the training is done. Will need to fix by properly terminating the worker processes.

Thanks for your kind reply. Yes, I notice it wont affect the training results, as each best model has been saved. Would like to ask one more thing: If the training process is interrupted, say at epoch 324, is there any flag or parameter to make train.py resume training at epoch 324, or the last saved model?

Thanks!

williamlai3a avatar Jan 24 '19 15:01 williamlai3a

can you tell me which tensorflow version you used?

yulongyu avatar Apr 01 '19 00:04 yulongyu

Hi @yxlao @yulongyu , I am also having the same issue. Is there any update on it?

Also,

can you tell me which tensorflow version you used?

I am using tensorflow-gpu version 1.12.0

kartik144 avatar Apr 26 '19 06:04 kartik144