Diversify-MHA icon indicating copy to clipboard operation
Diversify-MHA copied to clipboard

EMNLP 2018: Multi-Head Attention with Disagreement Regularization; NAACL 2019: Information Aggregation for Multi-Head Attention with Routing-by-Agreement

On the Diversity of Multi-Head Attention

Implementation for EMNLP 2018 paper: Multi-Head Attention with Disagreement Regularization and NAACL 2019 paper: Information Aggregation for Multi-Head Attention with Routing-by-Agreement, based on the THUMT toolkit.

More details including data and pre-trained models are coming later.