fast-bert Attention Weights

Does this return the attention weights that is possible to obtain from the BERT model through PyTorch transformers?

Aug 16 '19 03:08 sashank06

Hi @sashank06, I am also interested in returning the attention weights as in the models from Pytorch transformers, so I've been exploring it a little bit. With the current options of Fast-bert I think it's not possible. However, there is a simple workaround.

According to the Huggingface Transformers docs

# Models can return full list of hidden-states & attentions weights at each layer
model = model_class.from_pretrained(pretrained_weights, 
                                    output_hidden_states=True, 
                                    output_attentions=True)

Moreover, according to the Huggingface Transformer models docstrings:

**attentions**: (`optional`, returned when ``config.output_attentions=True``)

But in Fast-bert, we have that logic in the learner_cls.py file, and unfortunately, it's not parametrized.

config = config_class.from_pretrained(pretrained_path, num_labels=len(dataBunch.labels))

[...]

if multi_label == True:
    model = model_class[1].from_pretrained(pretrained_path, config=config, state_dict=model_state_dict)
else:
    model = model_class[0].from_pretrained(pretrained_path, config=config, state_dict=model_state_dict)

The workaround it's to manually add the output_attentions parameter to the config class. So we have to replace:

config = config_class.from_pretrained(pretrained_path, num_labels=len(dataBunch.labels))

with:

config = config_class.from_pretrained(pretrained_path, num_labels=len(dataBunch.labels), output_attentions=True)

After that, the output of the predict_batch function (for example) contains the attention weights.

Note: I can do a pull-request with this option parametrized if @kaushaltrivedi want me to do it and is willing to merge it. Hope this helps.

Oct 07 '19 16:10 alberduris

Thanks. Please create the pull request. Happy to merge it.

Oct 07 '19 16:10 kaushaltrivedi

I am assuming you only need attention weights during inference time.

Oct 07 '19 17:10 kaushaltrivedi

Yes, that's it.

Oct 07 '19 18:10 alberduris

@alberduris I have been doing the same using outpt_attentions= True. It would be a great feature to be integrated with fast-bert.

Oct 07 '19 20:10 sashank06

I updated the load_model method in learner_cls.py by adding output_attentions=True to the from_pretrained methods but after loading my model by model = BertLearner.from_pretrained_model( databunch, pretrained_path='model_out', metrics=[{'name': 'accuracy', 'function': accuracy}], device=torch.device("cuda"), logger=logging.getLogger(), output_dir='output') the predict_batch method does still not return any attention weights.

What am I missing?

Oct 05 '20 22:10 fuuman

Sorry but I haven't been hacking this library for a time now so I am a bit outdated. Check the Transformers repo, see how they handle the output of attention, track the function calls and the parameters needed and try to check if everything is the same in this library. Anyway, maybe @kaushaltrivedi can help you.

Very appreciated if you post it here in case you manage to solve it :rocket:

Oct 06 '20 07:10 alberduris