noanti
noanti
The formula used for selecting the best leaf is `len(x['urls']) * (max_reductions - x['reductions']) ** 2` It seems work well, but I can't understand why. Is it the best formula?...
If i use faiss as a Memory, during the inference,calculating each token requires 3(becase there are 3 memory attention layers) knn search, right? Will the generation speed become very slow?
just like bloom or t5?
https://github.com/bigscience-workshop/Megatron-DeepSpeed/blob/main/megatron/data/mtf_dataset.py#L34 The MTFDataset class take `documents` as arguments, but didn't use it(except in assert statement). I think `documents` is train/valid/test split index, is it ok to ignore `documents`?