Michael Gschwind comments

Results 71 comments of


                                            Michael Gschwind

Remove dump of model IR

@byjlw please assign this issue to somebody on your team to resolve. We missed the release cut on this, but let's stop doing this, without control.

Major bug in Transformers' masks

torch.nn.MultiHeadAttention is defined to accept floats => https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html "For a float mask, the mask values will be added to the attention weight."

Major bug in Transformers' masks

(at least) as early as November 2021, we issued warning about byte tensors being deprecated for torch.nn.MultiHeadAttention, e.g., here => https://github.com/pytorch/pytorch/issues/67999 ``` warnings.warn("Byte tensor for attn_mask in nn.MultiheadAttention is deprecated....

Major bug in Transformers' masks

I recommend that we correct the documentation for MultiHeadAttention to reflect that byte masks were deprecated a while ago, e.g., here => https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html as well as here => https://pytorch.org/docs/stable/generated/torch.nn.quantizable.MultiheadAttention.html#torch.nn.quantizable.MultiheadAttention.forward

Major bug in Transformers' masks

It's slightly more complicated than this? because key_padding_mask might be either Boolean or Float. Are permutations allowed where one mask is Boolean and the other is FLoat, or should we...

Major bug in Transformers' masks

Presumably you had a test case that demonstrates the problem? Can you please create a PR and submit it, so we can verify the PR fixes the issue?

Major bug in Transformers' masks

Can you please provide more context as to what you see as the problem? Maybe we haven’t documented it in the doc strings clearly enough that floats are intended to...

Support code gen for non-cuda targets with gpt-fast

I’ll remove it until we find whether it buys us performance (I’ve seen above additional improvements for cpu sdpa land from the Intel team land since I did my experiments)...

Support code gen for non-cuda targets with gpt-fast

Waiting on a review which is required to merge. Addressed @Chillee 's feedback. If he's not available who else can review? @cpuhrsch @jisaacso ?