nlpfollower

Results 2 comments of nlpfollower

@Gryff1ndor, I think it might be because $\pi_\theta(y_w | x)$ is the probability of the entire sequence $y_w$ conditioned on the input $x$. So after decomposing into tokens: ```math \pi_\theta(y_w\...

Happy to help! I'm learning this stuff as well, so take it with a grain of salt, but I think there's a couple of things you can do: 1. Play...