about the training period, time of the Prim agent
Hi, thx for sharing the source code of this wonderful project. I have a question following issue #1 . I am recently training both Prim and Mesh agents on airplane dataset using the depth image input. However, it took very long time and I am not sure whether the training is really effective. So far, for Prim agent, each epoch took 0.85 hours, and I am currently running until 140 epochs (more than 4 days as you mentioned in #1 ) However, the reward curves are not converging based on my observation. And I compared the currently trained model with the pretrained Prim agent weights (compare the mean rewards using test.py). The mean reward of the currently trained model is still pretty bad. I am wondering do you happen to have any ideas on potential mistakes I made here?
thank you
Hey, could you show me your training curve for the reward? How about the reward of IL? Let us first confirm whether the IL stage learns good policies.
Hi thx for reply.
Here is the IL reward training curves for airplane and car dataset:
Based on these plotting, I think the IL did not really learning a good policies because I suppose the agent should improve the rewards gradully.
But maybe I understand it wrongly?
thank you
I have re-trained the imitation learning part for the airplane dataset.
And I think the reward training curves looks similar.
I remember I encountered an issue with the IL process, and the RL process does not start with a good policy. I'm not sure if this also leads to your issue but hope it could help.
Usually, for the first Dagger IL iteration of each shape, the reward is exactly the same as the "expert" so it is very high; but sometimes it learns nothing in the next few Dagger steps. It is a weird phenomenon and I guess it is due to the abnormal gradient that the agent starts with (but I'm not sure). What I did is to observe the IL process for the first shape. Since the shapes are trained one by one, if the last Dagger iteration of the first shape can obtain a reward comparable to the "expert", then the following learning process should perform normally too. If not, just stop the IL process, and restart the IL training.
I found a picture of the overall reward curve on my computer. The correct curve should be like this, where the reward is jittering but on average it keeps at a high value.
Hello, I am recently training both Prim and Mesh agents on airplane and guitar datasets using the depth image input as well. But I am also not sure whether the training is really effective, because the reward curves don't converge according to the above design. In short, I have the same problem as you.
Luckily, I see the same question you mentioned above. So I'd like to ask you whether you have successfully repeated the work in this paper. Looking forward to your reply.
thank you
Hello, I am recently training both Prim and Mesh agents on airplane and guitar datasets using the depth image input as well. But I am also not sure whether the training is really effective, because the reward curves don't converge according to the above design. In short, I have the same problem as you.
Luckily, I see the same question you mentioned above. So I'd like to ask you whether you have successfully repeated the work in this paper. Looking forward to your reply.
thank you
Hi! Did you try the strategy I mentioned above?
Hello, I am recently training both Prim and Mesh agents on airplane and guitar datasets using the depth image input as well. But I am also not sure whether the training is really effective, because the reward curves don't converge according to the above design. In short, I have the same problem as you. Luckily, I see the same question you mentioned above. So I'd like to ask you whether you have successfully repeated the work in this paper. Looking forward to your reply. thank you
Hi! Did you try the strategy I mentioned above?
Yes, I did.