Wang Peng comments

Results 62 comments of


                                            Wang Peng

The code of OFA-base is inconsistent with the pre-trained checkpoint

@zzhanghub @Charles-Xie Changing the dimension of `image_position_idx` to 1026 is to be consistent with the dimension of `embed_positions`, it shouldn't affect the performance. I guess you save the model during...

The code of OFA-base is inconsistent with the pre-trained checkpoint

@zzhanghub There could be many reasons: 1. We use Pillow to open and save the picture, and the picture information will be lost in this process. I don't know if...

The code of OFA-base is inconsistent with the pre-trained checkpoint

@zzhanghub 1. If we simply use `T.CenterCrop` after `T.RandomResize`, the processed image may not contain the target object, so we use `T.ObjectCenterCrop`. Another way is to directly resize the image...

About punctuations

Yes, you can do as Abay says. We remove these punctuations because most of the descriptions in COCO are single sentences, and punctuation is not helpful for CIDEr calculation.

About punctuations

@zml110120 set `transtab = str.maketrans({key: None for key in string.punctuation if key != ','})` or just remove this line.

Quick Question Regarding Image Captioning Loss Function

@abaybektursun For detection, we follow [pix2seq](https://arxiv.org/abs/2109.10852) to quantize the original coordinates into discretized values, so we can use simple cross-entropy for this task. Back to your question, I think you...

run_scripts /caption /evaluate_caption.sh ?

@cxy990729 Did you run `evaluate_caption.sh` successfully? If you run successfully, the JSON file should be generated in `${result_path}` (default to `../../results/caption/` relative to the path of the shell script). Before...

run_scripts /caption /evaluate_caption.sh ?

run_scripts /caption /evaluate_caption.sh ?

@Sultanax **Torch not compiled with CUDA enabled**. Did you install torch with the correct CUDA version?In addition, If you use a custom dataset. This line `python coco_eval.py ../../results/caption/test_predict.json ../../dataset/caption_data/test_caption_coco_format.json` is...

why dose the pretrain corpus have 100w coco_qa pairs?

> VQA2.0 dataset use COCO iamges I know, but VQA2.0 dataset only have 66w qa pairs (train+val). Where did the extra 40w qa pairs come from?