igor0
igor0
I hope you don't mind me stalking this project, but I tried this out on enwik8 (https://github.com/igor0/memorizing-transformers-pytorch/commit/d302feee0c3d9655a92c392850c4ec5d86bff77c). I basically just ported the enwik8 training loop from another one of @lucidrains's...
I ended up having two issues with KNN on the GPU. Here are the findings so far. **1. the wheel package faiss-gpu hangs for me on A100 and A10** With...
What type of index to use then? One problem is that we don't know the distribution of the keys upfront, and the clustering approaches require that. Furthermore, the distribution of...
FlatContainer in TorchPQ looks promising as a potential flat GPU index (to avoid the challenges with clustering): https://github.com/DeMoriarty/TorchPQ/blob/main/torchpq/container/FlatContainer.py It seems like `FlatContainer::set_data_by_address()` can arbitrarily overwrite records in the flat container....
The problem is that for each training sample, we need to: * search seq_len entries * remove seq_len entries * add seq_len entries So, adds : removals : searches are...
> The number of embedding you add at every iteration is pretty small, so the memory use will be limited until you do a few thousands steps. At this point...
> realistically, each document isn't going to exceed 32k tokens, before the next document comes along and the index needs to be cleared and retrained If we don't need to...
Thanks for developing and maintaining this library! One thought I'd add to this. >> [...] is it possible to stream byte arrays somehow for being able to parse super large...
@G2G2G2G That's still not quite like what I'm talking about. Let's say that I have this input: ``` {"article": "Article1", "sections": [{"id": "1"}, {"id": 2}]} {"article": "Article2"} ``` As I...
Let me add a bit more details on this: > Note that DeeperSpeed added a workaround for this (as a part of soft prompt tuning work), but it isn't guaranteed...