Can FFCV work on the fly with a PyTorch Dataset?

Open Vishu26 opened this issue 3 years ago • 1 comments

My general understanding of how FFCV works is that it serializes a dataset of a fixed size and shape into a .beton file. Now, in my workflow, I want to create FFCV data loaders on the fly to be able to sample data points of different size and dimensions. For example, I may want to sample an audio file at higher sampling rate. This cannot be achieved through transforms as it would need to access the source audio file. Is there a way to do such a thing?

Jan 17 '23 18:01 Vishu26

Sadly, I don't think this is something we'll be able to support anytime soon, the .beton file is central to FFCV's data loading strategy. I'd recommend either making separate datasets for different sampling rates, or making one high-rate dataset and downsampling from there. Sorry about that.

Feb 28 '23 22:02 andrewilyas