gotutiyan
gotutiyan
I faced the same problem. I try to add `rounding_mode='trunc'` option to torch.div() in [line 81](https://github.com/Katsumata420/generic-pretrained-GEC/blob/master/BART-GEC/fairseq/search.py#L81), i.e. ```python torch.div(self.indices_buf, vocab_size, out=self.beams_buf, rounding_mode='trunc') ``` There are still many warnings after this...
I don't know of any de facto standard methods, but I think these scripts can be used reliably. M2Convertor: https://github.com/Jason3900/M2Convertor convert_m2_to_parallel.py https://github.com/kanekomasahiro/gec_tutorial/blob/main/src/convert_m2_to_parallel.py
I'm not the author of the paper but I can provide information for the last question. In general, GECToR will first train only classifier layers firstly, so the memory usage...
Typically, the four datasets are used together (even if GECToR or some seq2seq models). If the datasets are used one by one, it would be 4-stages training. However, given that...