xmtf icon indicating copy to clipboard operation
xmtf copied to clipboard

Quesiton about MTFDataset

Open noanti opened this issue 3 years ago • 1 comments

https://github.com/bigscience-workshop/Megatron-DeepSpeed/blob/main/megatron/data/mtf_dataset.py#L34

The MTFDataset class take documents as arguments, but didn't use it(except in assert statement). I think documents is train/valid/test split index, is it ok to ignore documents?

noanti avatar Feb 06 '23 05:02 noanti

If I remember correctly, documents are more like "rows". Typically in MTF, it's going to be an input + a target.

As to why we have documents, I think we originally wanted to support permutating randomly but then decided against it, to i'd say it's safe yo remove.

thomasw21 avatar Feb 07 '23 10:02 thomasw21