--continue_iter is buggy
I used the following statement: python3 main.py --task AntBandits-v1 --num_subs 2 --macro_duration 1000 --num_rollouts 2000 --warmup_time 20 --train_time 50 --continue_iter 00615 --replay True AntAgent
The thing i notice is that you need to write "0" infront of the iteration-number. Another thing i noticed is that i needed to copy the files from "savedir" to the folder "AntAgent" to make it work. I guess the checkpoint-algorithm uses the wrong directory for storing checkpoints.
The first output also shows "It is Iteration 0 so i'm changing [...]". But i wanted to continue the learning process and didn't want to start from beginning.
Thanks for the report, I'll look into it. For now: although the output shows "iteration 0", it is continuing the learning process from the checkpoint.
Hi! Do you run this code on GPU? My computer has two TITAN XP gpus, When I run this code, the utilization of the first one is only 5%. The second one is even zero. Do you know why my GPU utilization is so low? Do I need to modify the code appropriately according to the configuration of my computer? Thanks!
@Muguangfeng
Well you kinda hijacked this topic, but i will answer to you. It probably depends on what version of tensorflow is installed. It sounds to me that you have installed the default variant of tensorflow. If i remember right, the default only uses cpu. ( also take a look at https://www.tensorflow.org/install/gpu )
So sorry! Do you mean that this code can only run on the CPU? In order to save time, I want to use GPU to accelerate. I've seen this tutorial for installing TensorFlow-gpu before. And I installed TensorFlow-gpu = 1.8.0. It can run, but its speed hasn't improved because Low GPU utilization.
In addition, after training, I view the training process by running: python main.py --task AntBandits-v1 --num_subs 2 --macro_duration 1000 --num_rollouts 2000 --warmup_time 20 --train_time 30 --replay True --continue_iter 00015 AntAgent. it cannot find the file. The folder of the file is savedir/Antagent/checkpoints/. Is that wrong?
does:
import tensorflow as tf
print(tf.test.is_gpu_available())
also return True for you?
ok, looks like the current code does not set the gpu-device-count. See Email for more details.