DALI Question on video loading

@JanuszL and team hope you're doing OK. I have a pytorch video model that assumes each batch element is a five-frame video sequence around a frame at index ind, i.e., [ind-2, ind-1, ind, ind+1, ind+2]. Now, I need to pass a batch of such sequences to that model, ie.,

seqs = []
for ind in range(batch_size):
    seq = # images [ind-2, ind-1, ind, ind+1, ind+2]
    seqs.append(seq)

I have a vision of how to do this in pytorch after getting the sequence from DALI. is there a simpler way to do this in DALI?

May 16 '22 20:05 danbider

Hi @danbider,

If your data is held as a video file you can check the video.reader operator, adjusting step and stride accordingly should do in this case. If you have them as separate files you can use the external source operator to return any set of frames you want. DALI image decoder doesn't support stacked images (sequences) so you need to return each frame as a separate external source output, decode and then use cat.

May 17 '22 07:05 JanuszL

In addition to what @JanuszL wrote, this is roughly how your readers.video pipeline should look like:

@pipeline_def
def pipe(batch_size=..., ...):
    video = fn.readers.video(file_list=..., sequence_length=5, step=1)
    ...
    return video

May 17 '22 07:05 szalpal

@JanuszL @szalpal -- thanks for the quick and elegant solution. defining sequence_length=5, step=1, random_shuffle=False works fine for me. see images: first pipe.run() second pipe_run() This is exactly what I need to predict successive frames in a video using a context around each frame.

In addition, however, to train my semi-supervised pytorch models, it would be useful if the first pipe.run() could return a batch-of-sequences like one of the above, yet the second pipe.run() returns a batch-of-sequences somewhere else in the video, say at indices

[100, 101, 102, 103, 104]
[101, 102, 103, 104, 105]
[102, 103, 104, 105, 106]

When applying just random_shuffle=True, each batch-element contains a sequence from a different part of the video. See: Do you have any suggestions for me?

May 17 '22 14:05 danbider

Hi @danbider,

In the case of DALI, each sequence is a single sample that is independent. What you can do in this case is to treat

[100, 101, 102, 103, 104]
[101, 102, 103, 104, 105]
[102, 103, 104, 105, 106]

as a single sample:

[100, 101, 102, 103, 104, 104, 105, 104, 105, 106]

and use tensor slicing to split it into multiple outputs from DALI. Would that work for you?

May 17 '22 15:05 JanuszL

@JanuszL hello,

That's clear. I've started doing something like that in torch after getting a batch from DALI. Would it be beneficial, performance-wise, to do the tensor slicing in DALI?

May 17 '22 15:05 danbider

@danbider,

It depends if you need to process this data further. If not then doing this in Torch may be better as you would just create a view of a tensor, in DALI there will be a copy of the data.

May 17 '22 15:05 JanuszL

@JanuszL -- No need for further processing after reshaping. Let me give it a try in torch and report back. Thanks for the help.

May 17 '22 15:05 danbider

@JanuszL @szalpal Hey, so i've made some progress. wanted to verify the following -- I need to define two pytorch dataloaders, relying on DALI video readers as discussed above.

Iterates over a video with batch_size=1 and some sequence_length until the video is finished. if last batch has fewer frames than sequence_length that's fine. What should be the step here? what should be the num_batches argument here? is there any argument to the pipeline or LightningWrapper that I should be careful about?
As we discussed above, second loader has batch_size > 1 (some number), sequence_length=5 (fixed), and step=1 (fixed), scanning 5 frames sequentially, until reaching the last five frames of the dataset -5:. once we get there stop, don't try to grab any more frames. What should be the other args here? like num_batches and LastBatchPolicy?

May 20 '22 03:05 danbider

Hi @danbider,

Iterates over a video with batch_size=1 and some sequence_length until the video is finished. if last batch has fewer frames than sequence_length that's fine. What should be the step here? what should be the num_batches argument here? is there any argument to the pipeline or LightningWrapper that I should be careful about?

You can use pad_sequences to pad the last sequence with fewer frames with an empty one. Now DALI assumes that all sequences from the video reader have the same length.

As we discussed above, second loader has batch_size > 1 (some number), sequence_length=5 (fixed), and step=1 (fixed), scanning 5 frames sequentially, until reaching the last five frames of the dataset -5:. once we get there stop, don't try to grab any more frames. What should be the other args here? like num_batches and LastBatchPolicy?

You can set pad_last_batch=True in the reader, LastBatchPolicy.PARTIAL and set reader_name to the video reader name. So the FW iterator would cut the mirrored samples from the last batch (sequences, not the frames inside the sequences) and just return the partial batch at the very end.

May 20 '22 09:05 JanuszL

These work fine for me. I will say that when I enumerate the LightningWrapper manually, I get the expected number of batches and the expected behavior in the last batch. However, when I use pytorch-lightning's trainer.predict(), which presumably enumerates that LightningWrapper too, it will just iterate num_batches times, where num_batches is an integer that I specify for the length of LightningWrapper. Have you ran into that?

May 23 '22 02:05 danbider

@danbider,

Can you provide any simple, self-contained reproduction we can run? I have noticed that PyTorch Lightning does a couple of things under the hood that requires DALI to be configured in a certain way to yield the expected behavior (for example https://github.com/NVIDIA/DALI/issues/3902).

May 23 '22 12:05 JanuszL

@JanuszL let me make one repro for you.

May 23 '22 14:05 danbider

@JanuszL @szalpal was traveling, will get you a repro as soon as I can. a question in the meanwhile: say I want to sequentially read 16 frames from a video, start to end, i.e., sequence_length=16 and step=16, and random_shuffle=True. And I want to do it for a few dozen videos on disk. Does it make sense to define a pipe and a LightningWrapper per video? or is there a more efficient way that I can loop over a single pipe(filenames=filenames, ...) but i'll need to know when we stop iterating over a given video and move to the next? I need this in order to save predictions files properly for each video.

Jun 04 '22 00:06 danbider

Hi @danbider,

say I want to sequentially read 16 frames from a video, start to end, i.e., sequence_length=16 and step=16, and random_shuffle=True. And I want to do it for a few dozen videos on disk. Does it make sense to define a pipe and a LightningWrapper per video? or is there a more efficient way that I can loop over a single pipe(filenames=filenames, ...) but i'll need to know when we stop iterating over a given video and move to the next? I need this in order to save predictions files properly for each video.

DALI randomly picks sequences from all available videos, so there is no guarantee that consecutive samples in the batch are from the same video. Defining it for each video would consume too much of the GPU memory, the recreation of the pipeline sounds like a more efficient way but DALI would let you know only when you consume all samples (run over the whole epoch) in the data set. What I see is that maybe DALI is not the best choice for your use case and you may consider checking VFR.

Jun 06 '22 01:06 JanuszL