DALI Video reader of full sequence at given sampling rate

Describe the question.

Hi,

I would like to use DALI to extract frames at a rate of 2 FPS, while the original videos are encoded at 25 FPS. Note, that the length of the videos varies.

I think that I can use the keyword stride to sample one every N frames, but I don't know how to get rid of sequence_length in a proper way.

The basic idea is to reproduce the same output as I would have with this kind of code:

import numpy as np
from tqdm import tqdm
import cv2
import moviepy.editor

def getDuration(video_path):
    """Get the duration (in seconds) for a video.

    Keyword arguments:
    video_path -- the path of the video
    """
    return moviepy.editor.VideoFileClip(video_path).duration

class FrameCV():
    def __init__(self, video_path, FPS=2, start=None, duration=None):
        """Create a list of frame from a video using OpenCV.

        Keyword arguments:
        video_path -- the path of the video
        FPS -- the desired FPS for the frames (default:2)
        transform -- the desired transformation for the frames (default:2)
        start -- the desired starting time for the list of frames (default:None)
        duration -- the desired duration time for the list of frames (default:None)
        """

        self.FPS = FPS
        self.transform = transform
        self.start = start
        self.duration = duration

        # read video
        vidcap = cv2.VideoCapture(video_path)
        # read FPS
        self.fps_video = vidcap.get(cv2.CAP_PROP_FPS)
        # read duration
        self.time_second = getDuration(video_path)        

        # loop until the number of frame is consistent with the expected number of frame, 
        # given the duratio nand the FPS
        good_number_of_frames = False
        while not good_number_of_frames: 

            # read video
            vidcap = cv2.VideoCapture(video_path)
            
            # get number of frames
            self.numframe = int(self.time_second*self.fps_video)
            
            # frame drop ratio
            drop_extra_frames = self.fps_video/self.FPS

            # init list of frames
            self.frames = []

            # TQDM progress bar
            pbar = tqdm(range(self.numframe), desc='Grabbing Video Frames', unit='frame')
            i_frame = 0
            ret, frame = vidcap.read()

            # loop until no frame anymore
            while ret:
                # update TQDM
                pbar.update(1)
                i_frame += 1
                
                # skip until starting time
                if self.start is not None:
                    if i_frame < self.fps_video * self.start:
                        ret, frame = vidcap.read()
                        continue

                # skip after duration time
                if self.duration is not None:
                    if i_frame > self.fps_video * (self.start + self.duration):
                        ret, frame = vidcap.read()
                        continue
                        

                if (i_frame % drop_extra_frames < 1):
                    # append the frame to the list
                    self.frames.append(frame)
                
                # read next frame
                ret, frame = vidcap.read()

            # check if the expected number of frames were read
            if self.numframe - (i_frame+1) <=1:
                logging.debug("Video read properly")
                good_number_of_frames = True
            else:
                logging.debug("Video NOT read properly, adjusting fps and read again")
                self.fps_video = (i_frame+1) / self.time_second

        # convert frame from list to numpy array
        self.frames = np.array(self.frames)

    def __len__(self):
        """Return number of frames."""
        return len(self.frames)

    def __iter__(self, index):
        """Return frame at given index."""
        return self.frames[index]

I also met an error while I wanted to return the frame number output using enable_frame_num=True. In the documentation, it is specified that filenames must be passed, which is the case in my test. However, I have the following error:

Error when constructing operator: readers__Video encountered:
[/opt/dali/dali/operators/reader/video_reader_op.h:78] Assert on "can_use_frames_timestamps_ || !enable_frame_num_" failed: frame numbers can be enabled only when `file_list`, or `filenames` with `labels` argument are passed

I understand the error, but I think that the documentation is misleading as it is not mentioned that labels must be passed (I don't specify any labels in my example).

The code I used is directly derived from one of your example:

@pipeline_def
def create_video_reader_pipeline(sequence_length, files, crop_size, stride=1):
    images, num_frames = fn.readers.video(device="gpu", filenames=files, sequence_length=sequence_length,
                              normalized=False, random_shuffle=False, image_type=types.RGB,
                              dtype=types.UINT8, initial_fill=16, pad_last_batch=True, name="Reader",
                              stride=stride, enable_frame_num=True,
                             )
    images = fn.crop(images, crop=crop_size, dtype=types.FLOAT,
                     crop_pos_x=fn.random.uniform(range=(0.0, 1.0)),
                     crop_pos_y=fn.random.uniform(range=(0.0, 1.0)))

    images = fn.transpose(images, perm=[3, 0, 1, 2])

    return images, num_frames

class DALILoader():
    def __init__(self, batch_size, file_root, sequence_length, crop_size, stride=1):
        container_files = [os.path.join(root, f) for root, _, files in os.walk(file_root) for f in files if "mkv" in f]
        self.pipeline = create_video_reader_pipeline(batch_size=batch_size,
                                                     sequence_length=sequence_length,
                                                     num_threads=2,
                                                     device_id=0,
                                                     files=container_files,
                                                     crop_size=crop_size,
                                                     stride=stride,
                                                    )
        self.pipeline.build()
        self.epoch_size = self.pipeline.epoch_size("Reader")
        self.dali_iterator = pytorch.DALIGenericIterator(self.pipeline,
                                                         ["data"],
                                                         reader_name="Reader",
                                                         last_batch_policy=pytorch.LastBatchPolicy.PARTIAL,
                                                         auto_reset=True)

    def __len__(self):
        return int(self.epoch_size)

    def __iter__(self):
        return self.dali_iterator.__iter__()

loader = DALILoader(2, "path/to/data", 10, [224, 398], 13)

Thanks in advance,

Renaud

Check for duplicates

[X] I have searched the open bugs/issues and have found no duplicates for this bug report

Jul 25 '23 15:07 rvandeghen

@rvandeghen ,

sequence_length determines how many frames each sample in the output will have. In other words, if sequence_length==3, every sample in the output of readers.video operator will be a sequence of 3 frames. As you correctly noticed, using proper configuration of stride and step, you may turn a 25fps video into a 2fps one.

If I understood correctly, you'd like to create a setup where every video file fills a single batch of data. In this case, you should set the sequence_length as a function of target fps and input video duration. Something like this:

sequence_length = duration * target_fps

Jul 26 '23 12:07 szalpal

@szalpal Thanks for your fast response.

Yes you are right, and since I'm new to DALI, I try to correctly understand the API. Because I have different length duration, I guess that I can use the maximum bound of all my durations and apply sequence_length = max(duration) * target_fps, set pad_sequences to True and track the redundant frames when their frame number is -1 ? Does you API provide a tool to get the duration of a video ?

If this methodology is correct, then the issue is that I can not access the frame number, as mentioned at the end of the first comment.

Thanks, Renaud

Jul 26 '23 14:07 rvandeghen

@rvandeghen ,

Could you elaborate more on what is the expected behaviour of DALI? From the example you've provided above I've assumed, that the expected behaviour would be to use DALI to process video files and then feed this into the training. If so, I don't fully understand the reasoning behind using max(duration) for calculating the sequence_length. I do understand it, however, if the expected behaviour is just to process a video files (from 25fps to 2fps). In the latter case it would be probably best to use another operator (not readers.video). Anyway, if you'd describe the use case I'd be able to help you more on this :)

If it's the training use case, it would be best if you'd describe the input to your model (specifically the layout) and how the video files you're working with look.

Jul 26 '23 15:07 szalpal

@szalpal,

Just to let you know, I'm already able to use DALI in the case of training. However, I was also concerned to use DALI to process my video files (for inference), in the same way as the code snippet provided above using opencv. My goal is to replace opencv with DALI, but for reproducibility, I want to process the exact same frames, but since I don't know in advance the duration of my video, I don't know how to compute the sequence_length. If you think readers.video is not suited, please let me know.

I also met an unrelated problem while using my DALILoader. When I specify

loader = DALILoader(batch_size=64, file_root="path/to/data", sequence_length=1, crop_size=[224, 398], stride=1)

I have the following error:

[/opt/dali/dali/operators/reader/loader/video_loader.h:179] ``file_list_include_preceding_frame`` uses the default value False. In future releases, the default value will be changed to True.
140588070192896 Exception in thread: [/opt/dali/dali/operators/reader/loader/video_loader.cc:683] Detected variable frame rate video. The decoder returned frame that is past the expected one
Stacktrace (6 entries):
[frame 0]: /home/rvandeghen/anaconda3/envs/self/lib/python3.8/site-packages/nvidia/dali/libdali_operators.so(+0x686e8e) [0x7fdf4443fe8e]
[frame 1]: /home/rvandeghen/anaconda3/envs/self/lib/python3.8/site-packages/nvidia/dali/libdali_operators.so(+0x531e6e) [0x7fdf442eae6e]
[frame 2]: /home/rvandeghen/anaconda3/envs/self/lib/python3.8/site-packages/nvidia/dali/libdali_operators.so(+0x28249ac) [0x7fdf465dd9ac]
[frame 3]: /home/rvandeghen/anaconda3/envs/self/lib/python3.8/site-packages/nvidia/dali/libdali_operators.so(+0x4a7cca0) [0x7fdf48835ca0]
[frame 4]: /lib64/libpthread.so.0(+0x82de) [0x7fdfa8bb32de]
[frame 5]: /lib64/libc.so.6(clone+0x43) [0x7fdfa815ae83]

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[45], line 1
----> 1 loader = DALILoader(batch_size=64, file_root="path/to/data", sequence_length=1, crop_size=[224, 398], stride=1)

Cell In[3], line 15, in DALILoader.__init__(self, batch_size, file_root, sequence_length, crop_size, stride)
     13 self.pipeline.build()
     14 self.epoch_size = self.pipeline.epoch_size("Reader")
---> 15 self.dali_iterator = pytorch.DALIGenericIterator(self.pipeline,
     16                                                  ["data"],
     17                                                  reader_name="Reader",
     18                                                  last_batch_policy=pytorch.LastBatchPolicy.PARTIAL,
     19                                                  auto_reset=True)

File ~/anaconda3/envs/self/lib/python3.8/site-packages/nvidia/dali/plugin/pytorch.py:194, in DALIGenericIterator.__init__(self, pipelines, output_map, size, reader_name, auto_reset, fill_last_batch, dynamic_shape, last_batch_padded, last_batch_policy, prepare_first_batch)
    192 if self._prepare_first_batch:
    193     try:
--> 194         self._first_batch = DALIGenericIterator.__next__(self)
    195         # call to `next` sets _ever_consumed to True but if we are just calling it from
    196         # here we should set if to False again
    197         self._ever_consumed = False

File ~/anaconda3/envs/self/lib/python3.8/site-packages/nvidia/dali/plugin/pytorch.py:211, in DALIGenericIterator.__next__(self)
    208     return batch
    210 # Gather outputs
--> 211 outputs = self._get_outputs()
    213 data_batches = [None for i in range(self._num_gpus)]
    214 for i in range(self._num_gpus):

File ~/anaconda3/envs/self/lib/python3.8/site-packages/nvidia/dali/plugin/base_iterator.py:298, in _DaliBaseIterator._get_outputs(self)
    296     for p in self._pipes:
    297         with p._check_api_type_scope(types.PipelineAPIType.ITERATOR):
--> 298             outputs.append(p.share_outputs())
    299 except StopIteration as e:
    300     # in case ExternalSource returns StopIteration
    301     if self._size < 0 and self._auto_reset == "yes":

File ~/anaconda3/envs/self/lib/python3.8/site-packages/nvidia/dali/pipeline.py:1003, in Pipeline.share_outputs(self)
   1001 self._batches_to_consume -= 1
   1002 self._gpu_batches_to_consume -= 1
-> 1003 return self._pipe.ShareOutputs()

RuntimeError: Critical error in pipeline:
Error when executing GPU operator readers__Video encountered:
Error in worker thread: [/opt/dali/dali/operators/reader/loader/video_loader.cc:683] Detected variable frame rate video. The decoder returned frame that is past the expected one
Stacktrace (6 entries):
[frame 0]: /home/rvandeghen/anaconda3/envs/self/lib/python3.8/site-packages/nvidia/dali/libdali_operators.so(+0x686e8e) [0x7fdf4443fe8e]
[frame 1]: /home/rvandeghen/anaconda3/envs/self/lib/python3.8/site-packages/nvidia/dali/libdali_operators.so(+0x531e6e) [0x7fdf442eae6e]
[frame 2]: /home/rvandeghen/anaconda3/envs/self/lib/python3.8/site-packages/nvidia/dali/libdali_operators.so(+0x28249ac) [0x7fdf465dd9ac]
[frame 3]: /home/rvandeghen/anaconda3/envs/self/lib/python3.8/site-packages/nvidia/dali/libdali_operators.so(+0x4a7cca0) [0x7fdf48835ca0]
[frame 4]: /lib64/libpthread.so.0(+0x82de) [0x7fdfa8bb32de]
[frame 5]: /lib64/libc.so.6(clone+0x43) [0x7fdfa815ae83]

Current pipeline object is no longer valid.

but when I change the sequence_length=2, I still have a warning but it does not crash

loader = DALILoader(batch_size=64, file_root="soccernet", sequence_length=2, crop_size=[224, 398], stride=1)

[/opt/dali/dali/operators/reader/loader/video_loader.h:179] ``file_list_include_preceding_frame`` uses the default value False. In future releases, the default value will be changed to True.
140588456318720 Exception in thread: [/opt/dali/dali/operators/reader/loader/video_loader.cc:683] Detected variable frame rate video. The decoder returned frame that is past the expected one
Stacktrace (6 entries):
[frame 0]: /home/rvandeghen/anaconda3/envs/self/lib/python3.8/site-packages/nvidia/dali/libdali_operators.so(+0x686e8e) [0x7fdf4443fe8e]
[frame 1]: /home/rvandeghen/anaconda3/envs/self/lib/python3.8/site-packages/nvidia/dali/libdali_operators.so(+0x531e6e) [0x7fdf442eae6e]
[frame 2]: /home/rvandeghen/anaconda3/envs/self/lib/python3.8/site-packages/nvidia/dali/libdali_operators.so(+0x28249ac) [0x7fdf465dd9ac]
[frame 3]: /home/rvandeghen/anaconda3/envs/self/lib/python3.8/site-packages/nvidia/dali/libdali_operators.so(+0x4a7cca0) [0x7fdf48835ca0]
[frame 4]: /lib64/libpthread.so.0(+0x82de) [0x7fdfa8bb32de]
[frame 5]: /lib64/libc.so.6(clone+0x43) [0x7fdfa815ae83]

Any thoughts on this?

Thanks, Renaud

Jul 26 '23 16:07 rvandeghen

Hi @rvandeghen,

The error comes from the fact DALI decodes frames based on the predicted timestamps. When you ask DALI to return sequences of a given step and stride, it computes the timestamps of each frame that the sequence will be composed of. Then the decoder seeks the right place in the video stream and starts decoding frames sequentially. DALI skips the frames that are not the requested ones. In the case of variable frame rate videos, the precomputed time stamps may not be accurate as the frames are not equally distributed in time. So for some parameters (stride, step), the video may work, for some may not.

Aug 02 '23 12:08 JanuszL

@JanuszL,

Thanks for the feedback ! I had a chat with someone who created the videos and he is aware that for some of them, he also experienced a problem of frame rate. I'm re-encoding all of them to see if it solves this issue, I'll keep you updated.

As long as I work with DALI, I'm facing new issues. Even though it is not really related to your business, I experience a low GPU usage while training my NN with pytorch lightning. I checked this issue, and the decoder usage is constant around 80%. I made an issue in lightning, but do you have something in mind that could cause this problem ?

PS: It is the first time that I'm using both DALI and lightning so I can not easily see where it comes from.

Renaud

Aug 02 '23 13:08 rvandeghen

@rvandeghen - it is possible that the data loading is still a bottleneck, and despite the GPU acceleration of decoding the training is still faster than the ability to provide the data. I would capture the GPU profile to see how the data loading part relates to the training, and in the mean time, you can replace the data loader with a random tensor (to rule out the data processing part from the measurement) to see what is the maximum performance you can expect from the training.

Aug 02 '23 13:08 JanuszL

@JanuszL, re-encoding the video did not solve my problem, so my workaround is to remove the faulty videos. Everything works fine, except that the GPU utilization is still quite low, and it comes from DALI, as I checked with dummy tensors.

Here is the pipeline I use now

@pipeline_def
def create_video_reader_pipeline(sequence_length, files, crop_size, stride=1, shard_id=0, num_shards=1):
    images = fn.readers.video(device="gpu",
                              filenames=files,
                              sequence_length=sequence_length,
                              normalized=False,
                              random_shuffle=True,
                              image_type=types.RGB,
                              dtype=types.UINT8,
                              initial_fill=16,
                              prefetch_queue_depth=10,
                              pad_last_batch=True,
                              name="Reader",
                              stride=stride,
                              enable_frame_num=False,
                              shard_id=shard_id,
                              num_shards=num_shards,
                             )

    images = fn.crop_mirror_normalize(images,
                                      dtype=types.FLOAT,
                                      output_layout="CFHW",
                                      crop=crop_size,
                                      crop_pos_x=fn.random.uniform(range=(0.0, 1.0)),
                                      crop_pos_y=fn.random.uniform(range=(0.0, 1.0)),
                                      mean=[0.279*255, 0.452*255, 0.378*255],
                                      std=[0.188*255, 0.188*255, 0.171*255]
                                     )

    return images

and the loader I use

device_id = utils.get_rank()
shard_id = utils.get_rank()
num_shards = utils.get_world_size()
  
file_root="path/to/data"
batch_size=B
sequence_length=F
container_files = [os.path.join(root, f) for root, _, files in os.walk(file_root) for f in files if "p.mkv" in f]

crop_size=(224, 224)
stride=S
  
pipeline = create_video_reader_pipeline(batch_size=batch_size,
                                          sequence_length=sequence_length,
                                          num_threads=5,
                                          device_id=device_id,
                                          shard_id=shard_id,
                                          num_shards=num_shards,
                                          files=container_files,
                                          crop_size=crop_size,
                                          stride=stride,
                                          )

class VideoDataset(pytorch.DALIGenericIterator):
    def __init__(self, *kargs, **kvargs):
        super().__init__(*kargs, **kvargs)

    def __next__(self):
        out = super().__next__()
        # DDP is used so only one pipeline per process
        # also we need to transform dict returned by DALIClassificationIterator to iterable
        # and squeeze the lables
        out = out[0]["data"]

        B, C, F, H, W = out.size()
        out = out.view(B*F, C, H, W)
        return out
    
    train_loader = VideoDataset(pipeline,
                                ["data"],
                                reader_name="Reader",
                                auto_reset=True,
                                last_batch_policy=pytorch.LastBatchPolicy.PARTIAL
                                )

I tried different configuration for my values B, F and S, and the (expected) conclusion is that the most performant configuration for a fixed budget of VRAM equal to M=BxF is B=1, F=M, S=1. I would like to know what is the best compromise, or at least what are the significant factors to have the best trade-off between most random batch and speed? As a reminder, my use case is to use DALI to extract frames from videos to train a NN. My whole dataset has ~80M frames, so my first thought is to set S to rather high value, but I know it would drop my performance. Also, let me know if num_threads or prefetch_queue_depth (or others parameters) play a significant role ?

Thanks

Aug 08 '23 13:08 rvandeghen

Hi @rvandeghen,

Regarding the efficiency of the DALI decoding, the batch size is irrelevant, only increasing sequence_length and reducing stride should improve the speed (the decoder seeks less in the stream, fewer frames are discarded as one not belonging to the sequence). Also num_threads and prefetch_queue_depth shouldn't have a significant impact (if any) regarding GPU video decoding.

Aug 08 '23 13:08 JanuszL

Hi @JanuszL,

I'm using this quite old issue to ask a new question. How can I sample only 1 random frame in each videos that I have in the filenames list using readers.video. The end goal is to have only one sample per video per epoch, but I want to have the opportunity to randomly sample the frame of a given video.

Dec 22 '23 14:12 rvandeghen

Hi @rvandeghen,

I'm using this quite old issue to ask a new question. How can I sample only 1 random frame in each videos that I have in the filenames list using readers.video.

I'm afraid it is not currently possible. You can ask DALI to create samples that are 1 frame long, but you cannot ask it to sample only one file during the epoch.

The only solution that can mimic that behavior is to use a file_list argument of the video reader. In this case you need to randomly pregenerate it and ask DALI to read it whiteout shuffling. Then each line in this file is one sample, and the whole file is the record of the whole training. In this case, you need to manually track epochs as from the DALI point of view it will be just a one, very long epoch.

Dec 27 '23 07:12 JanuszL

@JanuszL Thanks for the reply,

I've implemented a file_list, where I randomly create each entry like:

file_list = ""
for f in files:
    max_num_frame = compute_length(f) # get number of frame
    frame = random.randint(0, max_num_frame-1) # randint between 0 and 300 max
    file_list += f"{f} 0 {frame} {frame+1}\n"

Everything works fine, except that creating the dataloader takes ~15m when my file_list contains only 1 frame/video for only 1 epoch, which account for ~240k samples. I've tried with less sample and the time that it takes was significantly lower. So I still have some questions:

Does it scale with the number of samples in the file_listand thus with the number of epoch I will need or does it scale with the unique number of video that I will use, which is constant whatever the number of epoch ?
Does it scale with the position of the frame or the value of max_num_frame of each video ?
For the sake of my knowledge, what is happening under the hood ?

Jan 08 '24 16:01 rvandeghen

Hi @rvandeghen,

I think it would be best to describe what is going on under the hood to get a better understanding of the tradeoffs:

the file list is read and parsed.
then for each entry get_or_open_file is called, which opens and parse video file if is hasn't been yet (it usually takes between a couple and dozen ms). If the file has been opened previously it just opens it (a syscall that should be significantly faster)

The smaller the number of entries the faster it should be, but reducing the number of unique files should make an even bigger difference. As DALI needs to build the list of all sequences it needs to open all files ahead of processing and parse them (codec, number of frames) which takes a significant amount of time in total.

Jan 08 '24 18:01 JanuszL