Wang Peng
Wang Peng
@zzhanghub @Charles-Xie Changing the dimension of `image_position_idx` to 1026 is to be consistent with the dimension of `embed_positions`, it shouldn't affect the performance. I guess you save the model during...
@zzhanghub There could be many reasons: 1. We use Pillow to open and save the picture, and the picture information will be lost in this process. I don't know if...
@zzhanghub 1. If we simply use `T.CenterCrop` after `T.RandomResize`, the processed image may not contain the target object, so we use `T.ObjectCenterCrop`. Another way is to directly resize the image...
Yes, you can do as Abay says. We remove these punctuations because most of the descriptions in COCO are single sentences, and punctuation is not helpful for CIDEr calculation.
@zml110120 set `transtab = str.maketrans({key: None for key in string.punctuation if key != ','})` or just remove this line.
@abaybektursun For detection, we follow [pix2seq](https://arxiv.org/abs/2109.10852) to quantize the original coordinates into discretized values, so we can use simple cross-entropy for this task. Back to your question, I think you...
@cxy990729 Did you run `evaluate_caption.sh` successfully? If you run successfully, the JSON file should be generated in `${result_path}` (default to `../../results/caption/` relative to the path of the shell script). Before...
@cxy990729 Can you provide the complete log? A normal log should like this: ``` 2022-03-19 21:20:01 | INFO | ofa.evaluate | loading model(s) from ../../checkpoints/caption_large_best_clean.pt 2022-03-19 21:20:01 | INFO |...
@Sultanax **Torch not compiled with CUDA enabled**. Did you install torch with the correct CUDA version?In addition, If you use a custom dataset. This line `python coco_eval.py ../../results/caption/test_predict.json ../../dataset/caption_data/test_caption_coco_format.json` is...
> VQA2.0 dataset use COCO iamges I know, but VQA2.0 dataset only have 66w qa pairs (train+val). Where did the extra 40w qa pairs come from?