FreyWang

Results 2 issues of FreyWang

It seems that the way to calculate attention weight is different from origin paper: softmax(v* tanh(W*[s,h])), relu are used after softmax here, can you give some reasons or reference? `...

## Motivation When use `save_best='mIoU'` or other metric in cfg.evaluation, if `separate_eval=True`, the results metric keys will change to `0_mIoU` and `1_mIoU`... Lead to error when save best checkpoint ##...