Ferdinand Schlatt

Results 15 comments of Ferdinand Schlatt

Hmm, after finding a bug in my code which removed excluded a good portion of terms from being included in the simstring DB, the RAM reduction isn't as much as...

> Hi! > have you solve that problem?, I have the same :( Sort of. At the cost of including some duplicates in the SimString database, I was able to...

Hey Luca, Sure thing. I've also added that the preferred term is returned and applied black formatting to the repo, so there are a couple of additional changes. I'll create...

When setting `num_workers=0` the dataloader doesn't use multiprocessing. Multiprocessing is only used for `num_workers>0` . That's why it works for `num_workers=0`

> However, I don't see it connecting to the main theme of the package. That was my thinking exactly. I have a specific use case where preceding unit parsing is...

There are multiple edge cases that I can't seem to fix, but I also cannot afford to spend more time on. In case anyone runs across this thread, I'll be...

Finally did get around to fixing stuff. Here is the fork and branch in case anyone runs into a similar use case: https://github.com/fschlatt/quantulum3/tree/inverse.

My branch allows for in-memory indexing. Instead of passing a list of documents, you could probably also pass an iterator, that iteratively loads documents from disc. It most likely won't...

My branch doesn't support an iterator, but adds support for in-memory collections. With some minor modifications, you should be able to get it working with an iterator: https://github.com/fschlatt/ColBERT I could...

My fork supports indexing and searching with arbitrary doc ids ;)