苏剑林(Jianlin Su)
苏剑林(Jianlin Su)
Or any released code to explain how to construct whole word masking for Chinese?
首先我已经找到了这个答案: https://github.com/TKkk-iOSer/wechat-alfred-workflow/issues/2 然后我想请问一下,能不能给每个用户分配一个独立的端口呢?每个用户独立管理alfred?
首页写到“我们提供了载入huggingface/transformers的pytorch和tensorflow预训练模型方式”,但是 https://github.com/Tencent/TurboTransformers/blob/master/example/python/README_cn.md 又写到“首先我们需要准备一个使用huggingface训练好的bert模型”,也没看到tensorflow方式的例子。
in python 2.7: `from flashtext import KeywordProcessor` `keyword_processor = KeywordProcessor()` `keyword_processor.add_keyword(u'北京')` `keyword_processor.add_keyword(u'欢迎')` `keyword_processor.add_keyword(u'你')` `keyword_processor.extract_keywords(u'北京欢迎你')` return [u'北京', u'你'],missing u'欢迎' ?
f-string syntax only supports by python 3.6+. I just replace them with the common format.
I find prefix method in all of your trie modules does not return the postions of the result. For example, datrie module: > > > trie.prefix_items(u'foobarbaz') > > > [(u'foo',...
As we know I(X,Z) = KL(p(x,z)||p(x)p(z)). So why do you estimate mutual information by JSD rather than KL maximization? f-GAN also gives us KL(p(x)||q(x)) = max_T E_{x~p(x)} [T(x)] - E_{x\sim...
Good job! But I have some problems: 1. how if we use hinge loss as EnergyModel loss? Gradient penalty is actually good but we know with gradient penalty is slow....
compared with wgan-gp or wgan-div, your new GAN has an additional I(X,Z) term on generator loss. This term may prevent generator from mode collapse. As we known, wgan-gp or wgan-div...
**Please describe the bug** I attempted to use alpa, and the quick start example works fine. However, I encountered an error when implementing it with my own model. My model...