text2vec 关于测评结果

readme里的测评结果还可以提高吗？

May 25 '23 08:05 1264561652

可以，更高质量的数据和模型，效果更好

May 25 '23 09:05 shibing624

我的意思是能在你的这个代码上改的结果更高吗？如果能的话应该在哪些部分进行更改才可能使效果提高明显

---- 回复的原邮件 ---- | 发件人 | Ming Xu @.> | | 日期 | 2023年05月25日 17:30 | | 收件人 | @.> | | 抄送至 | @.>@.> | | 主题 | Re: [shibing624/text2vec] 关于测评结果 (Issue #69) |

可以，更高质量的数据和模型，效果更好

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

May 25 '23 09:05 1264561652

我没刻意去刷评估数据的效果；你可以自己调参试试。

top_p=0.9,
#Moderately increase the probability threshold of nucleus sampling to increase the quantity of candidate tokens and increase generation diversity.

temperature=1.0,
#The previous low temperature parameter could lead to a severe polarization in the probability distribution of generated words, which degenerates the generation strategy into greedy decoding.

do_sample=True,
#do_sample parameter is set to False by default. After setting to True, the generation methods turn into beam-search multinomial sampling decoding strategy.

no_repeat_ngram_size=6,
#Configure the probability of the next repeating n-gram to 0, to ensure that there are no n-grams appearing twice. This setting is an empirical preliminary exploration.

repetition_penalty=1.8,
#For words that have appeared before, in the subsequent prediction process, we reduce the probability of their reoccurrence by introducing the repetition_penalty parameter. This setting is an empirical preliminary exploration.

May 25 '23 09:05 shibing624

好的，谢谢

---- 回复的原邮件 ---- | 发件人 | Ming Xu @.> | | 日期 | 2023年05月25日 17:43 | | 收件人 | @.> | | 抄送至 | @.>@.> | | 主题 | Re: [shibing624/text2vec] 关于测评结果 (Issue #69) |

我没刻意去刷评估数据的效果；你可以自己调参试试。

top_p=0.9, #Moderately increase the probability threshold of nucleus sampling to increase the quantity of candidate tokens and increase generation diversity.

temperature=1.0, #The previous low temperature parameter could lead to a severe polarization in the probability distribution of generated words, which degenerates the generation strategy into greedy decoding.

do_sample=True, #do_sample parameter is set to False by default. After setting to True, the generation methods turn into beam-search multinomial sampling decoding strategy.

no_repeat_ngram_size=6, #Configure the probability of the next repeating n-gram to 0, to ensure that there are no n-grams appearing twice. This setting is an empirical preliminary exploration.

repetition_penalty=1.8, #For words that have appeared before, in the subsequent prediction process, we reduce the probability of their reoccurrence by introducing the repetition_penalty parameter. This setting is an empirical preliminary exploration.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

May 25 '23 09:05 1264561652

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.(由于长期不活动，机器人自动关闭此问题，如果需要欢迎提问)

Dec 27 '23 07:12 stale[bot]