all-seeing icon indicating copy to clipboard operation
all-seeing copied to clipboard

AS-V2 10M pretrain filtering strategy

Open wsluma opened this issue 10 months ago • 0 comments

Hi there,

For https://huggingface.co/datasets/OpenGVLab/AS-V2/blob/main/as_pretrain_10m.json, which is "as_pretrain_10m.json: the filtered 10M samples in AS-1B, which are used in the pretraining phase of Stage 2."

What is your filtering strategy? Is there some shortcomings for AS-1B?

Thank you for the awesome work.

wsluma avatar Apr 06 '25 15:04 wsluma