Clarification for training the network
Hi,
I was looking the code and I found that after each update of the labelled training data, you are retraining the network from scratch using more labelled data. Am I correct or missing something?
Thanks
hi @Griffintaur
You might misunderstand the code.
I do retrain the network from scratch, but not more labeled data. The data for retraining consists of two sources: 1) the fixed number of labeled data (kept the same during all iterations) and 2) more and more newly predicted unlabeled data.
Please try to see the code again.
ok. What I meant from more labelled data means fixed labelled data from original iteration and unlabeled data with pseudo labels (which are selected through selection criterion).
Yes. The number of labeled data is fixed, while the number of unlabeled data with pseudo labels are growing.
This is our strategy for semi-supervised learning.
@Yu-Wu https://github.com/Yu-Wu/Exploit-Unknown-Gradually/blob/a3ebea3851ef98e32d4611250c779c84cfbde171/run.py#L75
This is where we are determining the no of samples to be selected. It doesn't seem to be equal to equation in the paper
m_t=m_t−1+p·nu
Here nu is the cardinality of the initial unlabeled set.
Can you help to understand how the above code corresponds to this equation?
The equation is the same with the paper.
m_0 = 0 in at initial. Thus, m_i = ipnu.
Here args.EF / 100 is exactly the same as p. For example, p=0.05 in the paper, means args.EF=5 in the code. Thus we need to divide it by 100 in the implementation.
ok.Thanks for quick explanation but
Here len(u_data) is changing after each run wherein the paper nu is the cardinality of the INITIAL unlabeled set. Let me know if I am wrong
len(u_data) is a fixed number. You can print it in each iteration/step.
u_data is generated at the beginning, and no further changes are applied on this variable in the following code.
In each step, we select a few selected_data (pseudo data) from u_data. However, we do not remove these data in the u_data set.
Since you are increasing the no of selection each step. Then how you deal with repeated pseudo labelled data.
What I want to ask is let's say a pseudo labelled data was selected in step t and now the same pseudo labelled data is selected in step t+1 will you select the data and count in no of selection and update the old pseudo label
or
just ignore this data as it was already selected in the previous step t?
Data selection in each step is individual. So data selected in step t might also be selected in step t+1. We do not add any constrain on the data selection between step t and t+1.
So many data selected in step t, will be further selected in step t+1.
Then do you maintain the state of labelled data from step t before adding new data at step t+1 i.e you append the pseudo labelled data at step t+1 to labelled data at step t which is the combination of original and pseudo labelled data? Then it may happen that the same data can have totally different pseudo labels.
or you add the pseudo labelled data at step t+1 only to the original labelled data ignoring the addition of pseudo labelled data done at step t.
The second one. I add the pseudo labelled data at step t+1 only to the original labelled data ignoring the addition of pseudo labelled data done at step t.
This is because, I think the later model has more discriminative ability to justify whether two data are from the same person. So I did not maintain the pseudo-labeled data of previous steps.