theblackcat102

Results 69 comments of theblackcat102

There is a collection of unsupervised style transfer : https://github.com/fuzhenxin/Style-Transfer-in-Text

@ownthink 也许可以考虑用 [github large page](https://docs.github.com/en/github/managing-large-files/working-with-large-files/distributing-large-binaries) 单一文件最大可以到 2GB, 对于海外用户申请不到百度盘的用户比较方便。

@ownthink 50G 以上的就是建议用 AWS S3 类的付费服务吧,如果还是依赖免费网盘下载体验一定非常的差(作为服务商也不会希望帮你付这个流量费)。不然就试试看丢 [Academic Torrent](https://academictorrents.com/)

個人使用 jieba 也有幾年時間,老實說除了底層的 HMM 程式也應該做點修正了,程式可讀性也不符合 2022 年的標準(就單看變數命名,新手要怎麼理解)。 我自己的5分錢建議, 各位大老 ( @botissue @fxsjy @a358003542 )看看如何? * 後端分詞模型換成 onnx 格式的神經模型(確保未來不會受限於模型訓練框架,你想用 Pytorch, TF, Paddle 都行),並將分詞與詞性預測整合在同個模型下 * 詞頻的 DAG 程式改寫 * 將 pos, finalseg...

@shouldsee 1. 权重和字典分离具体指什么? 目前 jieba 把 [HMM的权重写死](https://github.com/fxsjy/jieba/blob/master/jieba/finalseg/prob_emit.py)在程式中,我觉得应该参考更成熟的 [HanLP 做法](https://hanlp.hankcs.com/docs/api/hanlp/pretrained/tok.html)或 huggingface transformer 的 from_pretrained ,做到可以让使用者只有执行时才下载(当然要喂模型档案路径也行)。 2. DAG不改的话具体有什么问题可以展开一下吗? 我觉得主要是可读性的问题?jieba 在最关键的步骤 [__cut_DAG](https://github.com/fxsjy/jieba/blob/master/jieba/__init__.py#L249 )连句 comment 都沒有, 另外 paddle 问题真的很多,各位搜寻一下这里的 issue stack 就发现一堆问题了。我有魔改 jieba 用...

Same here, Please merge the patch as soon as possible

My guess is this page is a client side generated site which the content are loaded after the website was loaded. Using requests only returns empty web page ( contents...

Not that I know of. This does seems to be a very niche but potential market though.

@wayblink I think one common use case would be clustering photo with the same person ( something iOS and android has been supporting for quite a while ). Android phones...