Li Dong
Li Dong
The ``as of'' template is not handled by the code. For example, "{{As of|2015}}, Microsoft is market-dominant." is converted to ", Microsoft is market-dominant." https://en.wikipedia.org/wiki/Template:As_of > {{As of|2018}} – As...
In https://github.com/kimiyoung/transformer-xl/blob/master/pytorch/mem_transformer.py#L105 , I am wondering why only `key` and `value` are layernormed, while `query` is not normed? In other variants (such as RelMultiHeadAttn), the qkv computation is implemented by...
When install under cygwin+windows, we need to replace line 23 in setup.py to: extra_link_args=["-L" + SRILM_LIB_DIR, "-liconv"],
sigmoid(x) = 1 / (1+exp(-x)) The original implementation does not have `-`.
Hi @clarkkev , To train [Electric](https://www.aclweb.org/anthology/2020.emnlp-main.20.pdf), use the same pre-training script and command as ELECTRA. Pass "electra_objective": false and "electric_objective": true to the hyperparameters. **We plan to release pre-trained Electric...