ffcv icon indicating copy to clipboard operation
ffcv copied to clipboard

shard large .beaton file into few shards

Open ItamarKanter opened this issue 3 years ago • 4 comments

Hi,
I have two related questions:

  • Is it possible to concatenate multiple .beaton files? that way one can distribute the workload of creating .beaton files into multiple processes.
  • In the same line, can we concatenate new samples into an existing .beaton file or should I create a new file from scratch (which is quite inefficient)?

ItamarKanter avatar Sep 05 '22 09:09 ItamarKanter

have the same question! would appreciate any help.

yashkant avatar May 03 '23 21:05 yashkant

any update on this issue? is there a way to load multiple .beaton files into a single dataloader?

ItamarKanter avatar Jun 14 '23 05:06 ItamarKanter

What's the benefit of using multiple beton files?

tavisshore avatar Jun 14 '23 19:06 tavisshore

  • the workload of creating files can be distributed among multiple processes which is faster and safer (in case of a fault in one of the processes)
  • provide flexibility to mix multiple types of data sources
  • simple way to append new samples, see #266 #325

ItamarKanter avatar Jun 15 '23 06:06 ItamarKanter