Raul Puri
Raul Puri
I've been trying to repro BERT's pretraining results from scratch in my own time, and I have been unable to train beyond an masked LM loss of 5.4. So if...
Hi, check out the [pytorch ](https://github.com/guillitte/pytorch-sentiment-neuron/blob/master/visualize.py)version of the code. (neuron 2388 is the sentiment neuron in the paper, as confirmed by [rakesh chada](https://rakeshchada.github.io/Sentiment-Neuron.html))
I can't remember exactly how but you should be able to set the number of cores/threads you want to be used for BLAS
Without any regularization I personally found that uniform sampling gave faster convergence, but was more unstable and blew up (see my issue #39). I also tried xavier initialization and that...
@jonnykira I found like you that they used weight norm in the paper which I initially glossed over/isn't in the code base. This turned out to be what I needed...
Same problem, please help.
Not a clue sorry. Memory errors possibly? It uses a heftier convnet.