Pengzhi Gao issues

Results 16 issues of


                                            Pengzhi Gao

GPT2Decoder enhancement

`GPT2Decoder` currently inherits `PretrainedGPT2Mixin`. This part might be improved (inherits `TFDecoder`) in the future.

enhancement

topic: modules

1. Add CI test for building documentations (Do not ignore `warnings` and add spellcheck). 2. Fix docstrings with incorrect/inconsistent Sphinx format. Currently, such issues are treated as `warnings` in the...

enhancement

topic: docs

Loading pre-trained model checkpoint and model arch config are coupled

setting `pretrained_model_name` will not only define the model arch but also load the pre-trained checkpoint. We should have another `hparam` to control whether to load pre-trained checkpoint or not.

enhancement

good first issue

topic: modules

Support ELMo in texar-pytorch

Requested by Forte Team. @hunterhector

enhancement

topic: modules

Add sequence_tagging example

Adapted from [sequence_tagging](https://github.com/asyml/texar/tree/master/examples/sequence_tagging) in `texar-tf`.

Add ELMo modules

Add texar-styled ELMo encoder adapted from allennlp. The corresponding tokenizer will be in another PR. Resolve some comments in #298 I checked the implementation of `ELMo` in `allennlp`, It seems...

Add BPETokenizer

There are some subtle differences between `BPE` implementation in [sentencepiece](https://github.com/google/sentencepiece) and `BPE` implementation in [subword-nmt](https://github.com/rsennrich/subword-nmt). We could probably delete everthing except `multi-bleu.perl` in [texar-pytorch/bin/utils](https://github.com/asyml/texar-pytorch/tree/master/bin/utils) after this one is implemented. [Transformer...

enhancement

topic: data

topic: examples

language_model_ptb example

resolve #232 (including `distributed_gpu` and `language_model_ptb` examples)

Support ALBERT from Google Language

https://openreview.net/pdf?id=H1eA7AEtvS

enhancement

topic: modules

transformer example and texar-pytorch/bin/utils should be improved

In `texar-pytorch/bin/utils`, `sentencepiece` is a package instead of a tokenization method, so `sentencepiece` encoding is not a very accurate way to describe the tokenization method. `sentencepiece` includes two sub-word tokenization...

enhancement

topic: examples