Results 21 comments of Yoni Gottesman

@ashok-arjun are you still working on this? If not I can take over :)

I think its because actions is a 1hot vector and there is 1 only in the chosen action, So multiplying will give you a vector of zeros instead of one...

> I prefer just returning the labels with masking applied, rather than returning the mask for the user to apply. I agree, but then what should be the ignore label?...

yea i just thought of non pytorch users where -100 is not the default. Anyways I updated the code to return `labels`

@Rocketknight1 this is ready to be reviewed yes :)

I agree it should be `assistant_mask` and not labels. I feel like the collator should be added here and not `trl` what do you think?

fixed your suggestions. Do you think the docstring should contain a small example ? like this is the phi template with the new token: ``` "{{ bos_token }}" "{% for...