Wrong primary language detected from multi-language paragraph

Open FrenchMajesty opened this issue 2 years ago • 0 comments

Here is a string that returns en as the language but is clearly not so:

Launchdarkly onboarding

Launchdarkly在用户第一次登录的时候，先是把onboarding分了4个步骤。

# 1. Create new flag
这一步，LD把场景概括为三个：
a. Kill switch: 创建一个开关可以临时关闭非核心功能
b. Release: 灰度发布
c. Experiment: AB测试
d. Migration: 这个没看懂 ( a temporary flag used to migrate data or system while keeping your application available and disruption free)。
最后还有一个Custom flag。
通过这种设计，它帮助用户理清自己的头绪，让产品更加关注，同时还能从用户的行为中得到反馈（哪个用的多）。
这种设计可以称为「离散化」——本来连续的混在一起的行为被离散成几个分类。

这种设计ZenUML可以借鉴的地方：
用户画序列图可以分为两种场景，
a. 高层设计：比较适合使用异步消息表达
b. 实现细节设计：比较适合使用同步消息表达

既可以起到教育用户的作用也可以提供针对性的帮助信息。

I think it gets thrown off by the initial words but doesn't look at the whole of the string to see what language is predominant.

The response I get is this:

[{"lang":"en","prob":0.9999956121315514}]

The way I would expect this to work is that it would detect cn first at a high percentage and maybe english at 20% because there are some words.

Dec 21 '23 03:12 FrenchMajesty