3DModelingRL icon indicating copy to clipboard operation
3DModelingRL copied to clipboard

about the training period, time of the Prim agent

Open jdily opened this issue 2 years ago • 7 comments

Hi, thx for sharing the source code of this wonderful project. I have a question following issue #1 . I am recently training both Prim and Mesh agents on airplane dataset using the depth image input. However, it took very long time and I am not sure whether the training is really effective. So far, for Prim agent, each epoch took 0.85 hours, and I am currently running until 140 epochs (more than 4 days as you mentioned in #1 ) However, the reward curves are not converging based on my observation. And I compared the currently trained model with the pretrained Prim agent weights (compare the mean rewards using test.py). The mean reward of the currently trained model is still pretty bad. I am wondering do you happen to have any ideas on potential mistakes I made here?

thank you

jdily avatar Jul 02 '23 11:07 jdily

Hey, could you show me your training curve for the reward? How about the reward of IL? Let us first confirm whether the IL stage learns good policies.

clinplayer avatar Jul 05 '23 10:07 clinplayer

Hi thx for reply. Here is the IL reward training curves for airplane and car dataset: image Based on these plotting, I think the IL did not really learning a good policies because I suppose the agent should improve the rewards gradully. But maybe I understand it wrongly?

thank you

jdily avatar Jul 07 '23 05:07 jdily

I have re-trained the imitation learning part for the airplane dataset. And I think the reward training curves looks similar. image

jdily avatar Jul 11 '23 12:07 jdily

I remember I encountered an issue with the IL process, and the RL process does not start with a good policy. I'm not sure if this also leads to your issue but hope it could help.

Usually, for the first Dagger IL iteration of each shape, the reward is exactly the same as the "expert" so it is very high; but sometimes it learns nothing in the next few Dagger steps. It is a weird phenomenon and I guess it is due to the abnormal gradient that the agent starts with (but I'm not sure). What I did is to observe the IL process for the first shape. Since the shapes are trained one by one, if the last Dagger iteration of the first shape can obtain a reward comparable to the "expert", then the following learning process should perform normally too. If not, just stop the IL process, and restart the IL training.

I found a picture of the overall reward curve on my computer. The correct curve should be like this, where the reward is jittering but on average it keeps at a high value. image

clinplayer avatar Jul 12 '23 09:07 clinplayer

Hello, I am recently training both Prim and Mesh agents on airplane and guitar datasets using the depth image input as well. But I am also not sure whether the training is really effective, because the reward curves don't converge according to the above design. In short, I have the same problem as you.

Luckily, I see the same question you mentioned above. So I'd like to ask you whether you have successfully repeated the work in this paper. Looking forward to your reply.

thank you

Miss-wang-maker avatar Nov 01 '23 01:11 Miss-wang-maker

Hello, I am recently training both Prim and Mesh agents on airplane and guitar datasets using the depth image input as well. But I am also not sure whether the training is really effective, because the reward curves don't converge according to the above design. In short, I have the same problem as you.

Luckily, I see the same question you mentioned above. So I'd like to ask you whether you have successfully repeated the work in this paper. Looking forward to your reply.

thank you

Hi! Did you try the strategy I mentioned above?

clinplayer avatar Nov 22 '23 03:11 clinplayer

Hello, I am recently training both Prim and Mesh agents on airplane and guitar datasets using the depth image input as well. But I am also not sure whether the training is really effective, because the reward curves don't converge according to the above design. In short, I have the same problem as you. Luckily, I see the same question you mentioned above. So I'd like to ask you whether you have successfully repeated the work in this paper. Looking forward to your reply. thank you

Hi! Did you try the strategy I mentioned above?

Yes, I did.

Miss-wang-maker avatar Nov 22 '23 04:11 Miss-wang-maker