Michael M Slusarz
Michael M Slusarz
Re the streaming parser: you are actually incorrect about the need for the stream to be a file resource. We use PHP temporary streams all over the place. Tremendously useful...
Duplicate of #44 Somebody needs to provide a testcase to reproduce, since I can't. This is the whole point of https://slusarz.github.io/dovecot-fts-flatcurve/configuration.html#fts_flatcurve_max_term_size so I'm not sure how this could happen. The...
Does removing 'fts_flatcurve_max_term_size = 30' from your config help?
This crash indicates memory allocations is causing out-of-memory errors. You've increased memory for the indexer from the default? Otherwise, not very useful as all the function data has been optimized...
Try something lower, like 1000. If you have large messages (with lots of indexing data), Xapian can use more than 256MB (default vsz_limit) of memory, which will cause out-of-memory issues....
Actually, commit_limit might be the even better setting to try a lower value. https://slusarz.github.io/dovecot-fts-flatcurve/configuration.html#fts_flatcurve_commit_limit
Thank you for debug help @edieterich ... you are correct that Utf8Iterator usage does not appear to be correct and that should be looked at. ...but with that being said,...
First, answering my own question, but the generic tokenizer IS UTF-8 aware and will correctly handle a split UTF-8 character at the split point. Also, it turns out the Dovecot...
This code was committed, and a new release was pushed almost a month ago. Haven't heard any response in this ticket, so the assumption is that these changes fixed the...
Update: I can confirm that this is a bug in Dovecot core code (specifically the FTS tokenization code). For the generic tokenizer, it doesn't happen for ALL large strings -...