Incorrect array indexes/sizes in the controller's trainer
Hi Zac,
There is a bug, possibly a list of cascading bugs, in the controller training script. Specifically, if controller_num_test_episode is greater than controller_num_episode, the following error occurs (in CarRacing):
Track generation: 1180..1479 -> 299-tiles track
Track generation: 1184..1484 -> 300-tiles track
Track generation: 1016..1274 -> 258-tiles track
Traceback (most recent call last):
File "train.py", line 451, in <module>
main(args)
File "train.py", line 422, in main
slave()
File "train.py", line 193, in slave
result_packet = encode_result_packet(results)
File "train.py", line 137, in encode_result_packet
r = np.concatenate([r, np.zeros(RESULT_PACKET_SIZE - eval_packet_size)-1.0], axis=0)
ValueError: negative dimensions are not allowed
The error can be reproduced by downloading the latest of the repo main branch, altering the CarRacing config file as shown below, then run the trainer only (no need to re-train the VAE or RNN):
export CONFIG_PATH=configs/carracing.config
CUDA_VISIBLE_DEVICES=-1 xvfb-run -a -s "-screen 0 1400x900x24 +extension RANDR" -- nice python train.py -c $CONFIG_PATH
Controller part of the config file:
controller_optimizer=cma
controller_num_episode=2
controller_num_test_episode=3
controller_eval_steps=4
controller_num_worker=10
controller_num_worker_trial=1
controller_antithetic=0
controller_cap_time=0
controller_retrain=0
controller_seed_start=0
controller_sigma_init=0.1
controller_sigma_decay=0.999
controller_batch_mode=mean
The evaluation results read from the workers could also be affected by this (see train.py at lines 219-220) because the orchestrator process is (over)reading num_episode items from the results, whereas there could only be num_test_episode items to read:
reward_list_total[idx, :num_episode] = result[2]
reward_list_total[idx, num_episode:] = result[3]
This could skew the reward mean for a particular batch and affect training performance and model accuracy. It should affect the Doom experiment as well, although I haven't tested it. A quick workaround is to set both controller_num_test_episode and controller_num_episode to the same value, but it is not ideal. I wonder if fixing this bug would get you closer to the results of the original paper.