Nitish Shirish Keskar comments

Results 13 comments of


                                            Nitish Shirish Keskar

Is there a Caffe implementation?

Unfortunately, we don't have one. However, the process of building it is identical. All you need are the two solutions and a function that computes the loss/accuracy on intermediate points....

--cuda and --tied are True by default?

Our apologies; we intend for both to be on by default so as to reduce the number of flags passed for each run. We will work on fixing that to...

Unpredictable behavior of adaptive softmax

Can you try on the commit https://github.com/salesforce/awd-lstm-lm/tree/bf0742cab41d8bf4cd817acfe7e5e0cbff4131ba ? If that works, I can help you with getting the improvements from that commit for low-vocabulary datasets.

finetune & pointer bugs?

@xsway I think you're issue is linked to https://github.com/salesforce/awd-lstm-lm/pull/32 I think everything is working as expected but we're printing the wrong validation loss/perplexity. Could you try patching that change and...

Mention requirements and instructions for QRNN in readme/requirements.txt

Thanks for pointing this out. I'll add such an instruction.

Out of memory when fine-tuning

I'm not sure this is an OOM error. The training should succeed on a 16GB V100. Can you provide more details about the file you're fine-tuning, TF versions etc.? Did...

Out of memory when fine-tuning

Yeah, I was able to replicate this. I was testing the fine-tuning on a 32GB V100 and it worked with higher batch sizes. Let me look into fine-tuning with lower...

Out of memory when fine-tuning

While I explore this, I noticed a PR that seems to circumvent this issue (https://github.com/salesforce/ctrl/pull/51). I haven't tested this out but it might be a temporary solution.

Out of memory when fine-tuning

> Yeah, I can confirm I also can't get V100 16gb 8CPU, 30gb Ram, 100gb SSD to work with tensorflow-gpu==1.14 on the moby dick training example with batch_size = 1...

Running the model on TPUs?

I haven't quite figured out how to get TPUs to be faster than GPUs for inference. I'll probably look into this soon. It's especially more complicated with top-k/nucleus sampling and...