J Mackenzie

Results 64 comments of J Mackenzie

Thanks for checking this out Sean! I'll add some responses in line. I'll also reiterate that this isn't a priority, I just wanted to flag it and "wishlist" it for...

Great, we'd be happy to store checkpoints on CloudStor with the remainder of the data in that case, no problem at all!

@elshize I think we'd need to run a check somewhere here: https://github.com/pisa-engine/pisa/blob/196a6e6e0b2312fad5befa58e613be4485294aa6/include/pisa/block_freq_index.hpp#L154 I can never remember the correct way to do it, but I think we need to flush and...

> > As a side question. Would it be faster/better to do lucene -> CIFF -> pisa? > > I don't think anybody has ever checked that, I certainly haven't....

Jimmy Lin has kindly managed to dig up some information from his personal Mac Pro (SSD): ``` time target/appassembler/bin/ExportAnseriniLuceneIndex -output cw12b-complete-20200309.ciff.gz -index lucene-index-ciff.cw12b.20200309 -description "Anserini v0.7.2, ClueWeb12-B13 regression" real 167m21.523s...

Just to follow up briefly, I ran a few quick-n-dirty Gov2 experiments. Indexing Gov2 via Anserini takes ~20 mins with 30 threads. Taking this Lucene index to CIFF takes about...

Thanks Matthias, I guess we should at least document the tooling so users have a choice, including how to get queries formatted via Anserini (etc).

I thought that it was well defined behavior ? If `non_essential_lists` is 0 and we remove 1, we will get `i` being the maximum value of a `uint64_t`. Then we...

@amallia Can we close this one? Or is there still a problem?

I'm happy to investigate but I'd like to reproduce it first, I can have a look later on and I'll report back once I have some further information.