WorldModels icon indicating copy to clipboard operation
WorldModels copied to clipboard

Incorrect array indexes/sizes in the controller's trainer

Open twoletters opened this issue 4 years ago • 0 comments

Hi Zac,

There is a bug, possibly a list of cascading bugs, in the controller training script. Specifically, if controller_num_test_episode is greater than controller_num_episode, the following error occurs (in CarRacing):

Track generation: 1180..1479 -> 299-tiles track
Track generation: 1184..1484 -> 300-tiles track
Track generation: 1016..1274 -> 258-tiles track
Traceback (most recent call last):
  File "train.py", line 451, in <module>
    main(args)
  File "train.py", line 422, in main
    slave()
  File "train.py", line 193, in slave
    result_packet = encode_result_packet(results)
  File "train.py", line 137, in encode_result_packet
    r = np.concatenate([r, np.zeros(RESULT_PACKET_SIZE - eval_packet_size)-1.0], axis=0)
ValueError: negative dimensions are not allowed

The error can be reproduced by downloading the latest of the repo main branch, altering the CarRacing config file as shown below, then run the trainer only (no need to re-train the VAE or RNN):

export CONFIG_PATH=configs/carracing.config
CUDA_VISIBLE_DEVICES=-1 xvfb-run -a -s "-screen 0 1400x900x24 +extension RANDR" -- nice python train.py -c $CONFIG_PATH

Controller part of the config file:

controller_optimizer=cma
controller_num_episode=2
controller_num_test_episode=3
controller_eval_steps=4
controller_num_worker=10
controller_num_worker_trial=1
controller_antithetic=0
controller_cap_time=0
controller_retrain=0
controller_seed_start=0
controller_sigma_init=0.1
controller_sigma_decay=0.999
controller_batch_mode=mean

The evaluation results read from the workers could also be affected by this (see train.py at lines 219-220) because the orchestrator process is (over)reading num_episode items from the results, whereas there could only be num_test_episode items to read:

      reward_list_total[idx, :num_episode] = result[2]
      reward_list_total[idx, num_episode:] = result[3]

This could skew the reward mean for a particular batch and affect training performance and model accuracy. It should affect the Doom experiment as well, although I haven't tested it. A quick workaround is to set both controller_num_test_episode and controller_num_episode to the same value, but it is not ideal. I wonder if fixing this bug would get you closer to the results of the original paper.

twoletters avatar Aug 09 '21 20:08 twoletters