Keyword_Extraction 数据集

还是不行呀，我应该怎样修改，求指导

Nov 14 '18 08:11 doris-he

您好，请问根据本数据集计算idf文件您使用的是什么方式？是单纯的对每个词求math.log(N/n)吗？我是这样单纯计算的，但是整个数据集所花费的时间非常恐怖，想请教一下有由有什么特殊的办法。

Jan 16 '19 12:01 zzdely

可以发一份数据集给我吗？万分感谢，[email protected]

Apr 08 '19 08:04 huangtianan

您好，请问可以发一份数据集给我吗？万分感谢，[email protected]

May 18 '19 15:05 suifeng227

数据地址：https://pan.baidu.com/s/1LBfqT86y7TEf4hDNCU6DpA 密码：qa2u

可以发一份数据集给我吗？万分感谢，[email protected]

May 29 '19 06:05 bigzhao

您好，请问可以发一份数据集给我吗？万分感谢，[email protected]

数据地址：https://pan.baidu.com/s/1LBfqT86y7TEf4hDNCU6DpA 密码：qa2u

May 29 '19 06:05 bigzhao

谢谢大佬哈！！

------------------ 原始邮件 ------------------ 发件人: "Bigzhao Tan"[email protected]; 发送时间: 2019年5月29日(星期三) 下午2:25 收件人: "bigzhao/Keyword_Extraction"[email protected]; 抄送: "黄田安"[email protected];"Comment"[email protected]; 主题: Re: [bigzhao/Keyword_Extraction] problem (#2)

数据地址：https://pan.baidu.com/s/1LBfqT86y7TEf4hDNCU6DpA 密码：qa2u

可以发一份数据集给我吗？万分感谢，[email protected]

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

May 29 '19 06:05 huangtianan

您好，请问根据本数据集计算idf文件您使用的是什么方式？是单纯的对每个词求math.log(N/n)吗？我是这样单纯计算的，但是整个数据集所花费的时间非常恐怖，想请教一下有由有什么特殊的办法。

是这样做的，当时拿的代码是jieba分词作者提供的，印象中跑的时间不算特别久。可以参考一下 https://github.com/fxsjy/jieba/issues/393

May 30 '19 03:05 bigzhao

你好，我想问一下gensim 的docvector为啥训练保存的vector和采用同样解析关键词infer出来的结果不一致呢？并且不同时间infer出来的结果老变动？训练样本本身：

print common_texts [['human', 'interface', 'computer'], ['survey', 'user', 'computer', 'system', 'response', 'time'], ['eps', 'user', 'interface', 'system'], ['system', 'human', 'system', 'eps'], ['user', 'response', 'time'], ['trees'], ['graph', 'trees'], ['graph', 'minors', 'trees'], ['graph', 'minors', 'survey']] modelnew.docvecs[0] array([-0.03136408, -0.03000615, 0.03789993, 0.00673222, 0.06904926], dtype=float32)

infer出来的结果

vector1 = modelnew.infer_vector(['human', 'interface', 'computer'], alpha=0.1, min_alpha=0.0001, steps=5) print vector1 [-0.09213404 -0.02755559 -0.05117805 0.03933969 -0.09278027]

vector1 = modelnew.infer_vector(['human', 'interface', 'computer'], alpha=0.1, min_alpha=0.0001, steps=5) print vector1 [-0.09092788 -0.02800747 -0.05157306 0.03857147 -0.09291061]

Sep 04 '19 03:09 dulimei