per icon indicating copy to clipboard operation
per copied to clipboard

Importance Sampling

Open VAndreasV opened this issue 7 years ago • 3 comments

First of all, thanks for this implementation.

One question however: I see that the weights for the importance sampling are calculated and returned when a batch is sampled. However, the weights aren't used further in the code.

Is this some legacy code from a feature that didn't work? Or is the code not finished yet?

Andreas

VAndreasV avatar Jul 23 '18 13:07 VAndreasV

~I think it's just returned for debugging. You could use them to see what kind of weightings each of your samples have.~

Nope, you're totally right! It's a bug. According to the original paper, you should multiply the TD error by the IS weights.

You can do this in the repo code by changing the loss calculation:

loss = (torch.FloatTensor(is_weights) * F.mse_loss(pred, target)).mean()

I'm not sure if the weighting should go inside the MSE calculation or can be applied outside, like I've done. By that, I mean I don't know if the weights should inside the square operation or not. I think outside the MSE is the correct place.

stormont avatar Feb 21 '19 01:02 stormont

I think each sample in a minibatch of losses should be multiplied by the corresponding IS weights. However, torch by default returns an averaged MSE loss over the minibatch. Hence, to multiply each loss value in a batch by its corresponding weight, the MSE loss should be calculated by setting reduce flag in the torch.nn.functional.mse_loss() to False Also, I like to calculate the loss values in a minibatch by passing the targets and the predictions in the format [ [loss for sample1], [loss for sample 2], .... [loss for sample n] ] instead of [loss for sample 1, loss for sample 2,........., loss for sample n]. Not sure if the second format is a correct method or not. After multiplication, the batch of this weighted loss can then be averaged using tensor.mean() and then you can call backward() on this.

being-aerys avatar Dec 02 '20 01:12 being-aerys

In the original paper, the loss is also multiplied by the TD-error. Seems if we calculate the gradient of your loss, there is no TD-error as a factor in the coefficient? @stormont

ChampagneAndfragrance avatar Jul 15 '22 01:07 ChampagneAndfragrance