yaserkl
yaserkl
@artetxem I've used a similar approach using ELMO word embedding. I have two almost identical vocab files in English which I extracted their embeddings using ELMO. I just wanted to...
Hi, it comes from fine-tuning.
Could you plz share your command?
Yes, if you get the same results during decoding, your trained model still hasn't converged. You should check convergence of the model by looking at the loss you receive from...
For CNN/DM dataset with 287226 train size, I'd suggest the following setup: Train this model using batch size 32 for 15 epochs (max_iter=134637) then activate RL training for another 15...
Yes, but this only works if you have a very well-trained model based on MLE loss and usually what researchers do is not to use eta=1 but use a value...
Yes, you still need the RL training to be true since you are using RL for training and make sure to set eta to the right value for the coverage.
Could you please share your decoding command and some of the outputs?
I'm not expecting to see an improvement in the results if you use the default hyperparameters in this package. This package is based on Pointer-Generator model. Please not that Paulus...
No, this issue is [well-discussed](https://github.com/abisee/pointer-generator/issues/12) in the original pointer-generator model page. Every time you run this model, it will generate a different result due to the multi-processing batching used in...