Customized BatchSampler + Litdata StreamingDataloader
🚀 Feature
Any way to use customized batchsampler + streaming dataloader?
Motivation
I hope that certain specific samples can be combined into a batch. I can decide each batch before optimizing and try to load the pre-assigned batch one at a time with batch_size=1, but litdata doesn't work (I'm not sure if it's because each batch is too large, samples in a batch is about 2.5 GB).
Hi @Phimos,
At the moment, LitData doesn’t support plugging in a custom sampler directly. However, you can try overriding the internal _create_shuffler method to customize shuffling/sampling behavior and see if it helps with your batching needs.
For reference: https://github.com/Lightning-AI/litData/blob/07705955e698f18a5921e173710a3c726c10b6d2/src/litdata/streaming/dataset.py#L274-L282