context_mask = (-1*(1-context_mask)) # need to flip 0 <-> 1
why it is context_mask = (-1*(1-context_mask)) # need to flip 0 <-> 1 not context_mask = (1-context_mask)) ?
I will ask the same question, why not (1-context_mask)?
why it is context_mask = (-1*(1-context_mask)) # need to flip 0 <-> 1 not context_mask = (1-context_mask)) ?
I think there are errors in line 170 and 290, causing the program to run correctly ~
170: eps = (1+guide_w)eps1 - guide_weps2 ===>>> eps =(guide_w)*eps1 +(1- guide_w)*eps2
290: context mask = (-1*(1-context mask)) ===>>> context mask = 1-context mask
I will ask the same question, why not (1-context_mask)?
I think there are errors in line 170 and 290, causing the program to run correctly ~
170: eps = (1+guide_w)eps1 - guide_weps2 ===>>> eps = (guide_w)*eps1 +(1- guide_w)*eps2
290: context mask = (-1*(1-context mask)) ===>>> context mask = 1-context mask
sorry for the delay and confusion. I'll explain my rationale. I wanted context_mask == 1 to mean that we mask out the context for that sample. BUT, when multiplying the class label by the context mask, c = c * context_mask, we need context_mask == 0 to mask it out. Hence, applying context_mask = (-1*(1-context_mask)) effectively inverts (or flips) the mask.
I probably should rename this to context_mask_invert = (-1*(1-context_mask)) then do c = c * context_mask_invert to avoid confusion.
Note applying context mask = 1-context mask would result in negative mask weights, which doesn't make sense.
sorry for the delay and confusion. I'll explain my rationale. I wanted
context_mask == 1to mean that we mask out the context for that sample. BUT, when multiplying the class label by the context mask,c = c * context_mask, we needcontext_mask == 0to mask it out. Hence, applyingcontext_mask = (-1*(1-context_mask))effectively inverts (or flips) the mask.I probably should rename this to
context_mask_invert = (-1*(1-context_mask))then doc = c * context_mask_invertto avoid confusion.Note applying
context mask = 1-context maskwould result in negative mask weights, which doesn't make sense.
1 - 0 = 1, 1 - 1 = 0 ,why ‘context mask = 1-context mask’ result in negative mask weights
The 0/1 flip is confusing yes. One simple solution is to just generate random Bernoulli values at probability (1 - drop_prob) instead of drop_prob. The function torch.bernoulli takes the positive probability and not the negative one so it'll be clearer and more coherent to use (1 - drop_prob).
Hi @TeaPearce let's say context_mask = [0,1], then (-1*(1-context_mask)) results in [-1, 0]. Multiplying this with the original context, c, you get negative values for your one-hot encoding. Is this intentional?