Leilan comments

Results 11 comments of


                                            Leilan

Error on using Spacy Tokenizer

For issue 1, to specify the English tokenizer, you may need to modify the default argument `lang='zh'` to `lang='en'` (dataset.py line 23: __init__ of DocDataset class). This change was made...

Error on using Spacy Tokenizer

For issue 2, this happened because after filtering out stopwords, some documents will leave no words and become empty, and those documents would not be counted as ‘processed’. That is...

Error on using Spacy Tokenizer

I am not sure what your situation is. Do you mean 'topic words', which are words displayed while training, when you say 'topic names'? If you use the provided tokenizer,...

Error on using Spacy Tokenizer

It seems a valuable idea to improve the filtering strategy to make the models more robust. I will fix that.

Error on using Spacy Tokenizer

Two options. One is to convert the genetic data into text, e.g. list all the protein names of one transcription factor in a line, separated by spaces. Custom the tokenizer...

Error on using Spacy Tokenizer

I've tried to run the GSM model on your data, and preprocess step works fine, (although it met an OOM error on my laptop due to the too large vocabulary...

Error on using Spacy Tokenizer

What do you mean by "latent vector (after processing)"? Do you mean the "topic distribution of a document" or "a document's latent representation"? Yes, you can calculate cosine similarities between...

原版的DailyDialog有13118个对话，与zhdd的对应关系是怎样的？

zhdd是先将DailyDialog做了去重后再用Baidu API做的机器翻译，当时网络原因出现了少量遗漏，所以数量上没有13118条，更正后的对齐数据见这里：[dailydialog_zh_en.json](https://github.com/zll17/Neural_Topic_Models/blob/master/data/dailydialog_zh_en.json)。 **Note**: According to the original [license](http://yanran.li/dailydialog), this transformed corpus is also licensed under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/).

GSM实现有问题？

谢谢反馈，我检查一下，稍后回复。

GSM实现有问题？

你好，我试验了GSM、WLDA和WTM这三个模型，是从仓库拉取重新配置的环境，其中GSM跑了三次，WLDA和WTM-GMM各跑了一次，结果大致是正常的，不过确实发现GSM有不稳定的现象，因为同样的参数两次的实验TD浮动了0.2（分别是0.423和0.626）左右，以下是我的命令和结果： | exp_id | 主要参数差异 | TD | | ------------------- | ------------------------------ | ----- | | gsm_exp0_manual | --no_above 0.0134 --no_below 5 | 0.423 | | gsm_exp1_autoadj | --autoadj |...