苏剑林(Jianlin Su)
苏剑林(Jianlin Su)
因为我想自己写python脚本,调用您给出的http接口来完成更灵活的功能。
另外就是http接口初衷是给alfred用的,可能不需要太复杂的功能,但是我觉得(建议)把http接口本身做的完善一点,便于二次拓展。(其实也是我的请求....)
哦哦,明白了,谢谢。也就是说这只是个huggingface/transformers的下游工具,连tf版本也只是支持huggingface的tf版本。
does it seems ridiculous that a string matching tool must have a tokenizer ?
oh, sorry, I am not blaming you. As I know, many string matching tool work with English letter as a mini unit. I am confused that why you would design...
maybe you can separatie the tokenizer and allow us to write our own tokenizer? like https://whoosh.readthedocs.io/en/latest/analysis.html
I suggest (just a suggestion ^_^) that just design it as a pure AC automata, like https://github.com/WojciechMula/pyahocorasick/ is more useful and more feasible. pyahocorasick is written in C, and I'd...
@vi3k6i5 I think the best you can do is separate the tokenizer, no matter English or Chinese. You can allow us to design our own tokenizer and pass it into...
卡住是啥意思?模型建立不了?还是建立了跑不起来?
卡住我倒是没遇到过,但估计跟下面的情况有关。 MirroredStrategy假设的是每个样本是独立的,总的loss是每个样本的loss的平均,但是batcn间的对比学习不满足这个条件,所以对比学习不能简单地直接用MirroredStrategy,也不能用梯度累积,就算它能跑起来,也不等于大batch_size的效果。 这种情况下要多卡,只能用keras原来自带的multi_gpu_model,或者bert4keras自带的data_parallel,单独建立一个多卡的encoder模型,然后接后面的层。