SPM_toolkit
SPM_toolkit copied to clipboard
about DecAtt
When i use this model for wikiQA Task,i found that the batch list is difficult.
Why should we resort the length?And The interval of batch_list is not 32.
DecAtt is very difficult to train, which I tried many ways to make it work, including gradient clipping, sorted length and etc. Previously people used length sorting to accelerate the model training and convergence speed, since the input doesn't vary a lot.