Neural_Topic_Models icon indicating copy to clipboard operation
Neural_Topic_Models copied to clipboard

Implementation of topic models based on neural network approaches.

Results 12 Neural_Topic_Models issues
Sort by recently updated
recently updated
newest added

pyhanlp不支持Python3.9及以上版本。

请问作者,data/ 下的stopwords.txt 是现有的停用词还是自己构建的,如是现有的,请问出处;如是自己构建的,请问构建规则和注意事项。

I just check your code and the train *_script run perfectly but when i try inference it seem like some 'param' was not save while training and as I see...

Hi and thank you very much for your really helpful code! I am trying to test my trained model and have problems with the inference.py file. I specified a checkpoint...

您好,感谢您贡献的代码。我跑了一下cnews10在GSM上的代码,有一个疑问是,KL散度消失(基本为0)而且主题发现效果很差,请问这是实现上的问题吗,我使用的是默认参数? ![image](https://user-images.githubusercontent.com/33925232/143544764-6169c782-1b4d-4e35-a71e-36806b5262a6.png) topic diversity:0.03866666666666667 c_v:0.7579875287637481, c_w2v:None, c_uci:-18.122450398623315, c_npmi:-0.6600369278689214 mimno topic coherence:-326.14847513585073 从TD和NPMI看出模型是有问题的。

最近我按照您的代码实现思路,我复现了一下论文中的模型,有几个问题想请教您: 1.不知道您是否增加试验过增加epoch,实验中我epoch=3000左右的时候Dsicrimimator_loss已经收敛,但是encoder和generator的loss我训练到20K的时候两者还未收敛? 2.topic words分布刚开始不是很好,会出现很多主题下有相同的词,但是在可能10Kepoch后有了效果。 希望能和您进一步讨论 感谢~

[0.011269581504166126 0.00033260477357544005 0.3443009555339813 0.0049138059839606285 0.007035833317786455 0.0002668765955604613 0.0021645957604050636 0.04201849177479744 0.0041013904847204685 0.005380461923778057 0.005701055750250816 0.30710265040397644 0.12966400384902954 0.06940549612045288 0.021206317469477654 0.0028165027033537626 0.0014157032128423452 0.00024422683054581285 0.0011101358104497194 0.039549320936203]:['2006', 'Pangandaran', 'earthquake', 'tsunami', 'occur', 'July', '17', 'subduction', 'zone', 'coast', 'west',...

I edited tokenize.py and in main called ``` tokenizer=SpacyTokenize() ``` to use the Spacy Tokenizer for English text. Tho I always end up getting a : ``` tcmalloc large alloc...