Many the fish issues

Results 74 issues of


                                            Many the fish

Harmless undefined behavior: prefix databases generation

Prefix FST has no real "defined" threshold on the number of indexed words to choose if we need prefix databases. But there is an undefined behavior that adds that threshold,...

thoughts

indexing

tests: make CI run tests in parallel

CI could have differents stages to: - run style - run tests - build documentation with stages, tests will be runned in parallel and CI should be faster to check...

Internal representation of synonyms shouldn't be showed when displaying settings

## current behavior When setting synonyms on the settings route: ```java '推进之王': ['王维娜'] ``` Synonyms are segmented and normalized before being stored in Meilisearch and the original value is dropped....

enhancement

milli

Enhance word splitting strategy

Today the word splitting strategy of the query tree is handled by the [function `split_best_frequency`](https://github.com/meilisearch/milli/blob/f8697075ea6b95b2d380e757af1724e75e0f21cf/milli/src/search/query_tree.rs#L266-L287). This function split a word into two sub-words by looking at the frequency of the...

hacktoberfest

Phrase search containing duplicates

When doing a PHRASE search containing several times the same word, no results are returned by Meilisearch. ### Step to reproduce 1) push some documents containing several times the same...

bug

good first issue

hacktoberfest

Store detected Language per document during indexing

> ⚠️: This issue is not an easy one, it requires some knowledge in Rust and more work than the other issues. I highly encourage beginners to take another issue....

hacktoberfest

Add missing logging timer to extractors

Initially, we added logging timers to indexing extractors in order to help the debugging of performance issues like in [extract_fid_word_count_docids.rs](https://github.com/meilisearch/milli/blob/4b903719a03c88f60a9073e3e2a31ad246626535/milli/src/update/index_documents/extract/extract_fid_word_count_docids.rs#L20-L21). However, these timers have not been automatically added to the...

good first issue

hacktoberfest

Enhance language detection

TBD @ManyTheFish

Reduce indexing benchmarks processing time

## Summary The benchmarks processing time is around 15 hours. Moreover, we struggle to target which benchmark is important and what part of the indexation is impacted by each. ##...

milli

tooling

maintenance

Indexing unit benchmarks

# Summary Add a unit benchmark for each extractor in the indexer, the goal is: - during the development phase, seeing fastly and efficiently the impact of local optimization -...

milli

tooling

maintenance