BrambleXu
BrambleXu
> 能具体讲讲“与分词模型进行联合训练”是什么吗? 这部分在3.2 Joint Training with Word Segmentation有讲
Could you give us more information about your goal? It would be great if you can provide some samples.
If your input file is csv file, there is no way to get the formatted view in the doccano annotating page. But we are going to support the json input...
The front is implemented by Vue, especially for the text in annotation page. It might be more complicated than autoescape. But you can try it locally and give us some...
Thanks for your feedback. Your request about the document is very reasonable. But right now we devote most of our time in useful features or solving bugs. The documentation might...
It is a typo😱 It should be 80.89%
Yes, I tried different parameters by running `test.sh` and got 80.89% finally. I am curious about what your config settings to get 82-83%.
分词 - 英语分词 - https://towardsdatascience.com/tokenization-for-natural-language-processing-a179a891bad4 这篇文章里的图将各种分词方法总结的不错 - https://www.analyticsvidhya.com/blog/2020/05/what-is-tokenization-nlp/ 有具体例子,当做补充内容 - https://neptune.ai/blog/tokenization-in-nlp 有一些分词工具 - https://www.kaggle.com/code/satishgunjal/tokenization-in-nlp kaggle的分词教程,可以用来写教程 - 日语分词 - https://qiita.com/klis/items/bb9ffa4d9c886af0f531 文章介绍了[konoha](https://github.com/himkt/konoha),文章里的link还有Mecab,Sentencepiece的用法,这些都是日语的分词工具 - https://cardinal-moon.hatenablog.com/entry/tokenize_and_subword 主要介绍了BPE和Sentencepiece - https://www.nogawanogawa.com/entry/tokenizer 简单介绍了使用不同分词工具的分词结果 将text转换为特征向量 - word embedding出现前,机器学习方式的特征转换...
@booxood 同上。如果影响到i18n的话,那不改也行。当然最好是能在不影响i18n的前提下,添加一个选项。 如果没有合适的实现方法的话,就直接关闭这个issue吧
### [2023-11 A Survey of Techniques for Maximizing LLM Performance](https://www.youtube.com/watch?v=ahnGLM-RC1Y&ab_channel=OpenAI) Click me 为什么优化LLM很难 - 从噪音中提取信号 - 性能很难量化 - 什么时候用什么方式优化 本次演讲的目的 - 一个基础模型 - 学会选择合适的方法 - 有信息能自己优化 上面两张图,优化不是线性的,而是从两个方面去优化,context和llm。 一个典型的优化流程 - (prompt...