Nitish Shirish Keskar
Nitish Shirish Keskar
Unfortunately, we don't have one. However, the process of building it is identical. All you need are the two solutions and a function that computes the loss/accuracy on intermediate points....
Our apologies; we intend for both to be on by default so as to reduce the number of flags passed for each run. We will work on fixing that to...
Can you try on the commit https://github.com/salesforce/awd-lstm-lm/tree/bf0742cab41d8bf4cd817acfe7e5e0cbff4131ba ? If that works, I can help you with getting the improvements from that commit for low-vocabulary datasets.
@xsway I think you're issue is linked to https://github.com/salesforce/awd-lstm-lm/pull/32 I think everything is working as expected but we're printing the wrong validation loss/perplexity. Could you try patching that change and...
Thanks for pointing this out. I'll add such an instruction.
I'm not sure this is an OOM error. The training should succeed on a 16GB V100. Can you provide more details about the file you're fine-tuning, TF versions etc.? Did...
Yeah, I was able to replicate this. I was testing the fine-tuning on a 32GB V100 and it worked with higher batch sizes. Let me look into fine-tuning with lower...
While I explore this, I noticed a PR that seems to circumvent this issue (https://github.com/salesforce/ctrl/pull/51). I haven't tested this out but it might be a temporary solution.
> Yeah, I can confirm I also can't get V100 16gb 8CPU, 30gb Ram, 100gb SSD to work with tensorflow-gpu==1.14 on the moby dick training example with batch_size = 1...
I haven't quite figured out how to get TPUs to be faster than GPUs for inference. I'll probably look into this soon. It's especially more complicated with top-k/nucleus sampling and...