Tomoko Uchida

Results 65 comments of Tomoko Uchida

Just to note: when I tried to index 1 million Wikipedia docs with vector values (200 dimensions), the indexer process suddenly suspended after running for 30 minutes or so; it's...

I'll also try with 100 dimensions and report if I find something noticeable.

> Unfortunately there is no tests for DictionaryBuilder. Yes it's a problem for future maintenance, I think we may need some kind of validator for binary encoded dictionaries rather than...

> At indexing: > Emit all tokens all of the times > If a given token is encounterred, then also emit the 2-gram starting at the trigger token. I didn't...

I think the shingle filter alone works well without #1073, but the combination could be useful in some situations?

FYI... Maybe there is a corresponding issue on Lucene: https://issues.apache.org/jira/browse/LUCENE-3320 I've started to investigate it; I'd like to help or give feedback here if it works on Lucene.

Hi, thanks for your effort! Would it be possible to strictly separate "data structure (data type)" and its "description"? I mean, relatively recent Lucene format documentation is written as this....

Hi, I was also trying to implement a shingle filter. I left a PR, but it's incomplete - I tried to explain where I've stuck in the description.

It could be interesting to port Lucene's monitor module. https://lucene.apache.org/core/8_11_0/monitor/org/apache/lucene/monitor/package-summary.html I've never tried. Just would like to give a pointer here.

> Luwak is another interesting project that communicates a lot about how to do it. Yeah, Luwak was contributed back to Lucene, and it's now the monitor module I mentioned...