whichlang
whichlang copied to clipboard
A blazingly fast and lightweight language detection library for Rust
Currently, this function can only return one of 16 supported languages. what if the input text is not one of them ? ````rust pub fn detect_language(text: &str) -> Lang; ````
Added a function detect_multiple_languages, which returns a vec of tuples (language, score), sorted by score. I'm stuck at the point of how the final scores for the languages are obtained...
Add iso639-1 output
Does `Whichlang` is about faster and slightly more accurate than [lingua-rs](https://github.com/pemistahl/lingua-rs)? **Rem:** it should be noted that ` lingua-rss` supports [75 languages](https://github.com/pemistahl/lingua-rs#3-which-languages-are-supported) compared to the 16 languages supported by `whichlang`
As I understand, `whichlang` is based on [whatlang](https://docs.rs/whatlang/0.16.2/whatlang/index.html) which currently supports 68 languages. Can `whichlang` be extended to support them also?
Thank you for your awesome work! I was wondering if you can link to the dataset you have used in the training notebook, I am referring to `train.csv` file.
In addition to detecting spoken languages, would it be possible to detect programming languages? One big use case for this is: syntax highlighting when you don't already know what language...
currently, the function is panicking with an empty string.
``` whichlang/inference_short time: [227.00 ns 228.02 ns 229.51 ns] thrpt: [120.50 MiB/s 121.29 MiB/s 121.84 MiB/s] change: time: [-12.746% -12.392% -11.959%] (p = 0.00 < 0.05) thrpt: [+13.583% +14.144% +14.608%]...