cold-compress
cold-compress copied to clipboard
How to get attention scores
Does your codes have function to analyze attention scores, or needs to be observed in the Transformer class
another question is not about this grate repository, When discarding kv , it depends on attention scores? But if there are different patterns of k and v that result in a small attentions, it seems loss of information.
Look forward to your reply sincerely!