jaime-m-p

Results 3 issues of jaime-m-p

Use flags for each unicode category (`\p{N}`, `\p{L}`, `\p{Z}`, ...) instead of definitions `CODEPOINT_TYPE_*`. Including helper flags for common regex params like `\s` (only this for now), `\d`, `\w`... This...

enhancement
review complexity : medium

Add all unicode [categories](https://www.compart.com/en/unicode/category) to `unicode-data.cpp`. Currently we are limited to high categories: * C, L, M, N, P, S, Z. This PR allows access to subcategories: * Cn, Cc,...

script
testing
python
Review Complexity : Medium

More tokenizer fixes. --- - [x] I have read the [contributing guidelines](https://github.com/ggerganov/llama.cpp/blob/master/CONTRIBUTING.md) - Self-reported review complexity: - [x] Low - [ ] Medium - [ ] High --- Examples of...

testing
Review Complexity : Low
python