about the hanging training process
Hi Junyu,
Thank you for your good work on the chart extraction.
I follow the Readme and properly install the DeepRule, and I want to test the training code. Everything looks good at the beginning:
['cache', 'pie']
loading all datasets...
using 1 threads
loading from cache file: /home/DeepRule/data/piedata_1008/cache/pie_train2019.pkl
loading annotations into memory...
/home/DeepRule/data/piedata_1008/pie/annotations/instancesPie(1008)_train2019.json
Done (t=1.61s)
creating index...
index created!
loading from cache file: /home/DeepRule/data/piedata_1008/cache/pie_val2019.pkl
loading annotations into memory...
/home/DeepRule/data/piedata_1008/pie/annotations/instancesPie(1008)_val2019.json
Done (t=0.03s)
creating index...
index created!
system config...
{'batch_size': 26,
'cache_dir': '/home/DeepRule/data/piedata_1008/cache',
'chunk_sizes': [5, 7, 7, 7],
'config_dir': './config',
'data_dir': '/home/DeepRule/data/piedata_1008/',
'data_rng': <mtrand.RandomState object at 0x7f1b20d20d38>,
'dataset': 'Pie',
'decay_rate': 10,
'display': 5,
'learning_rate': 0.00025,
'max_iter': 50000,
'nnet_rng': <mtrand.RandomState object at 0x7f1b20d20d80>,
'opt_algo': 'adam',
'prefetch_size': 5,
'pretrain': None,
'result_dir': './results',
'sampling_function': 'kp_detection',
'snapshot': 5000,
'snapshot_name': 'CornerNetPurePie',
'stepsize': 45000,
'tar_data_dir': 'cls',
'test_split': 'testchart',
'train_split': 'trainchart',
'val_iter': 100,
'val_split': 'valchart',
'weight_decay': False,
'weight_decay_rate': 1e-05,
'weight_decay_type': 'l2'}
db config...
{'ae_threshold': 0.5,
'border': 128,
'categories': 1,
'data_aug': True,
'gaussian_bump': True,
'gaussian_iou': 0.3,
'gaussian_radius': -1,
'input_size': [511, 511],
'lighting': True,
'max_per_image': 100,
'merge_bbox': False,
'nms_algorithm': 'exp_soft_nms',
'nms_kernel': 3,
'nms_threshold': 0.5,
'output_sizes': [[128, 128]],
'rand_color': True,
'rand_crop': True,
'rand_pushes': False,
'rand_samples': False,
'rand_scale_max': 1.4,
'rand_scale_min': 0.6,
'rand_scale_step': 0.1,
'rand_scales': array([0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2, 1.3]),
'special_crop': False,
'test_scales': [1],
'top_k': 100,
'weight_exp': 8}
len of db: 73075
building model...
module_file: models.CornerNetPurePie
use kp pure pie
total parameters: 198592652
setting learning rate to: 0.00025
training start...
start prefetching data...
shuffling indices...
['read.txt']
0%| | 0/50000 [00:00<?, ?it/s]
But for some reason, the training is hanging without any progress, I check the CPU usage and it has a zombie process as below:

The GPU usage is below:

Since I do not have an Azure account, I commented on code on file: "/DeepRule/models/CornerNetPurePie.py” at line 32: # from azureml.core.compute import ComputeTarget
I do not know what is the main reason for this. Please help us. Thank you in advance!
The OCR is replacable. You can replce it with some local OCR package https://pypi.org/project/pytesseract/ However you need to rewrite the ocr_result function
Can you tell me in more detail?
Hey @tairen99, have you found the reason for this issue? We encountered the same problems.
Edit: By stepping through the execution we could pinpoint the code responsible for the process being stuck.
The issue appears to be image = cv2.resize(image, (new_width, new_height)) on line 30 in sample/bar.py (and in other files in the same directory accordingly). We ended up using the suggestions from this thread and inserted multiprocessing.set_start_method('spawn', force=True) at the beginning of train_chart.py. Afterwards, everything worked as expected.