ben fattori

Results 2 issues of ben fattori

This PR updates the GPT2 lm_head weight by linking it to the token embedding weights. This is done in the official GPT2 TF implementation [here](https://github.com/openai/gpt-2/blob/master/src/model.py#L171).

Thank you for the code! I've been using it as a reference for my own implementation. Have you replicated the results in the original blogpost..? Based on your update in...