Lei Li
Lei Li
@wyxcc You can download poem or lyrics corpus, then preprocess e.g. tokenize, padding/truncating it to max length, and save it as the same format l did in [SeqGAN_Poem](https://github.com/TobiasLee/SeqGAN_Poem) , then...
1. You can give the model some "hint" by changing the start token or the initial hidden state, which is initiated all zeros. 2. Change the positive file to some...
The code doesn't have this baseline trick. You can try it and evaluate it yourself
词表大小 是 11681 的话,那么 word index 最大应该为 11680 这里可能是预处理的时候有点问题?
可以把 batch size 调小试试?应该不用很多内存的
1. 可能是的 当时我们的机器有 16GB 的内存 2. 语言模型没学好的概率比较大,可以考虑利用 MLE 训练一会
It seems that the language model itself is not trained well. Is the loss on the training set still decreasing?
well, i guess you can train the model longer and then check the results to see if it is getting better.
@leviding 认领 install/gpu.md 和 basic_text_classification.ipynb ,后面这个的 .md 打开是一个 Colab 然后显示 file removed,也就是说只翻译 .ipynb 就好了吧?
Hi 大致的过程如下: - 根据 topic 获取 ConceptNet 中的三元组对应的 triplet - 将三元组中的 head 和 tail 根据词表变成 word index - 进一步利用 word embedding 成 memory matrix 供 decoder 进行 read & refresh