added more scaling and same seeds for student and teacher
I have made this small modification by making two small changes that may be affecting performance.
First, I have made sure that the same seed is being used before the teacher and student pass, so that the dropouts are exactly the same, as discussed in the iCT paper:
We additionally ensure that the random number generators for dropout share the same states across the student and teacher networks when optimizing the CT objective in Eq. (5).
On the other hand, I have added scaling as I commented in the issue #7, in this case I have put the c_in that was seen in EDM repository and this repository and that is commented in the new openai paper:
For the sigma scaling, I have scaled it as described in EDM repository again.
I have not been able to run the notebook because of problems in my pc, but I hope there is no problem in the small implementation I have done.
Maybe, to be more consistent with the implementation, c_in should be updated to be:
1/((sigma - sigma_min)**2+sigma_data**2)**0.5
Any thoughts about that?