captum Only last few tokens are attributed when explaining GRU

I'm working with a 1-layer GRU for text classification that takes BERT embeddings at the input. Each input sequence is of the shape (sequence length, bert-embedding-dimension). I'm looking for word level attribution scores for each sequence's prediction. Currently with the captum integrated gradients and occlusion explainers, I get attribution scores that are almost always the last few tokens of each sequence. This seems like it's stemming from the directional processing of GRU - any thoughts? Or perhaps do I need a more careful selection of the baseline? Or could it be an implementation error?

Nov 17 '22 11:11 itsmemala

Could you elaborate more on this

I get attribution scores that are almost always the last few tokens of each sequence

Do you mean you can get the attribution scores of all tokens but always the last few tokens have pos/neg scores while others are zero? What is the baseline you are using?

Dec 08 '22 02:12 aobo-y

Hi,

Yes, I get attribution scores of all tokens but the scores for the tokens are in ascending order: first token has a score of the order of e-27 (at each embedding dimension) and it slowly increases with the last few tokens having a score of the order of e-3 or e-2 (again at each embedding dimension). This trend is the same for all inputs. I do not specify any baseline so I believe the captum default of an all-0 input is being used. Seems like this is stemming from the multiplicative nature of GRU?

Dec 18 '22 19:12 itsmemala