OpenViDial icon indicating copy to clipboard operation
OpenViDial copied to clipboard

ValueError: cannot mmap an empty file

Open xiang-xiang-zhu opened this issue 4 years ago • 6 comments

When I want to view the shape of train.features.mmap, numpy reports an error. How can I solve this problem

By the way, can I directly use the mmap file (such as train/valid/test. features.mmap) as the video feature, for example, save it as an .npy file for multimodal training

thank you

xiang-xiang-zhu avatar Aug 23 '21 10:08 xiang-xiang-zhu

Because I didn't download the complete image compression package, I want to know how many images there are in the training set, verification set and test set respectively

xiang-xiang-zhu avatar Aug 23 '21 11:08 xiang-xiang-zhu

@xiang-xiang-zhu Hi, I guess you are not familiar with mmap format. We choose to use mmap instead of npy because .npy will load np.array to memory, but our feature file is too big. If you want to know the number of images without downloading them, you can try to download text first, and each sentence in text should be paired with an image.

YuxianMeng avatar Aug 23 '21 12:08 YuxianMeng

Now I want to train with my own model. Can I directly use MMAP to read the data in the image part? Is the shape (image_num,feature_ dim)?

xiang-xiang-zhu avatar Aug 24 '21 05:08 xiang-xiang-zhu

@xiang-xiang-zhu Yes, please refer to our corresponding code.

YuxianMeng avatar Aug 24 '21 12:08 YuxianMeng

@xiang-xiang-zhu Yes, please refer to our corresponding code.

thank you very much I would like to ask whether [0, 1, 2] in the jsonl file represents a series of conversations from the 0 to the 2 sentences. 1 is the response of 0 and 2 is the response of 1. The conversations in [3,4,5] have nothing to do with [0,1,2] during training

np.memmap(feature_file(data_dir, split), dtype='float32', mode='r',shape=(self.total_num, self.dim)) By the way, when I read the mmap file with this code, does the array subscript represent the picture subscript? For example, after reading train.featrues.mmap, is the nth element in the read array the feature of the nth training picture

xiang-xiang-zhu avatar Aug 24 '21 12:08 xiang-xiang-zhu

@xiang-xiang-zhu Yes, both of your comments are right.

YuxianMeng avatar Aug 25 '21 01:08 YuxianMeng