Many the fish
Many the fish
Prefix FST has no real "defined" threshold on the number of indexed words to choose if we need prefix databases. But there is an undefined behavior that adds that threshold,...
CI could have differents stages to: - run style - run tests - build documentation with stages, tests will be runned in parallel and CI should be faster to check...
## current behavior When setting synonyms on the settings route: ```java '推进之王': ['王维娜'] ``` Synonyms are segmented and normalized before being stored in Meilisearch and the original value is dropped....
Today the word splitting strategy of the query tree is handled by the [function `split_best_frequency`](https://github.com/meilisearch/milli/blob/f8697075ea6b95b2d380e757af1724e75e0f21cf/milli/src/search/query_tree.rs#L266-L287). This function split a word into two sub-words by looking at the frequency of the...
When doing a PHRASE search containing several times the same word, no results are returned by Meilisearch. ### Step to reproduce 1) push some documents containing several times the same...
> ⚠️: This issue is not an easy one, it requires some knowledge in Rust and more work than the other issues. I highly encourage beginners to take another issue....
Initially, we added logging timers to indexing extractors in order to help the debugging of performance issues like in [extract_fid_word_count_docids.rs](https://github.com/meilisearch/milli/blob/4b903719a03c88f60a9073e3e2a31ad246626535/milli/src/update/index_documents/extract/extract_fid_word_count_docids.rs#L20-L21). However, these timers have not been automatically added to the...
TBD @ManyTheFish
## Summary The benchmarks processing time is around 15 hours. Moreover, we struggle to target which benchmark is important and what part of the indexation is impacted by each. ##...
# Summary Add a unit benchmark for each extractor in the indexer, the goal is: - during the development phase, seeing fastly and efficiently the impact of local optimization -...