igor0 comments

Results 11 comments of


                                            igor0

Any interesting results?

I hope you don't mind me stalking this project, but I tried this out on enwik8 (https://github.com/igor0/memorizing-transformers-pytorch/commit/d302feee0c3d9655a92c392850c4ec5d86bff77c). I basically just ported the enwik8 training loop from another one of @lucidrains's...

Any interesting results?

I ended up having two issues with KNN on the GPU. Here are the findings so far. **1. the wheel package faiss-gpu hangs for me on A100 and A10** With...

Any interesting results?

What type of index to use then? One problem is that we don't know the distribution of the keys upfront, and the clustering approaches require that. Furthermore, the distribution of...

Any interesting results?

FlatContainer in TorchPQ looks promising as a potential flat GPU index (to avoid the challenges with clustering): https://github.com/DeMoriarty/TorchPQ/blob/main/torchpq/container/FlatContainer.py It seems like `FlatContainer::set_data_by_address()` can arbitrarily overwrite records in the flat container....

Any interesting results?

The problem is that for each training sample, we need to: * search seq_len entries * remove seq_len entries * add seq_len entries So, adds : removals : searches are...

Any interesting results?

> The number of embedding you add at every iteration is pretty small, so the memory use will be limited until you do a few thousands steps. At this point...

Any interesting results?

> realistically, each document isn't going to exceed 32k tokens, before the next document comes along and the index needs to be cleared and retrained If we don't need to...

Streaming Parser?

Thanks for developing and maintaining this library! One thought I'd add to this. >> [...] is it possible to stream byte arrays somehow for being able to parse super large...

Streaming Parser?

@G2G2G2G That's still not quite like what I'm talking about. Let's say that I have this input: ``` {"article": "Article1", "sections": [{"id": "1"}, {"id": 2}]} {"article": "Article2"} ``` As I...

Fine-tuning GPT-NeoX doesn't work (for many scenarios) with the 16-bit stage-0 optimizer

Let me add a bit more details on this: > Note that DeeperSpeed added a workaround for this (as a part of soft prompt tuning work), but it isn't guaranteed...