colpali icon indicating copy to clipboard operation
colpali copied to clipboard

Add ColQwen3 and ColQwen3MoE

Open selimcavas opened this issue 4 months ago • 8 comments

  • add ColQwen3 + ColQwen3MoE wrappers around Qwen3VL backbones
  • add training entrypoint for Qwen3
  • updated transformers to v4.57.1 to access the Qwen3-VL backbones

I am currently unable to test the training script due to GPU constraints, this is mostly a draft implementation done with codex. MoE processing is currently the same with dense, I kept the implementation seperate to leave room for the implementations to diverge later.

selimcavas avatar Nov 08 '25 20:11 selimcavas

I forgot to tag @ManuelFay

selimcavas avatar Nov 10 '25 08:11 selimcavas

Surely pretty nice - but we can't merge things that have not been tested through training !

@QuentinJGMace has been training Qwen3VL models recently, there is a branch open already. I'll let this one open so he can see cherry pick what he wants from both branches ! Thanks for the contrib!

ManuelFay avatar Nov 10 '25 10:11 ManuelFay

Hey @selimcavas ! thanks for the contrib.

I'm not sure about implementing support for MoE models as I don't think we'll train one (and none exists at the moment). But if one is trained one day I'll be happy to merge the code to support it.

As @ManuelFay said, i've been experimenting a bit with qwen3, as I'm soon on (long) hollidays i'm not sure when a new model will come out, but one should be eventually :)

QuentinJGMace avatar Nov 10 '25 11:11 QuentinJGMace

Maybe we can pass this off to @mlconti1 ?

ManuelFay avatar Nov 10 '25 11:11 ManuelFay

Okay I might try training a model by adjusting the params I currently have a rtx5090, approximately how many gpu hours (H100) does it take to train a full model such as ColQwen2.5? I planned to train the Qwen3 VL 2B model

selimcavas avatar Nov 10 '25 11:11 selimcavas

Okay I might try training a model by adjusting the params I currently have a rtx5090, approximately how many gpu hours (H100) does it take to train a full model such as ColQwen2.5? I planned to train the Qwen3 VL 2B model

I'm casually training Colqwen3-vl-2B on an RTX 5090. I'm expecting it to take roughly 16 hours, with checkpoints every 250 steps and tracking via wandb.

PR in my fork, if you want to have a look: https://github.com/athrael-soju/colpali/pull/6/files

I think it's got potential of being a great colpali model and the recipe is already there from previous models, so why not?

athrael-soju avatar Nov 20 '25 22:11 athrael-soju

Hi, sorry for the delay, just came back from holidays too! Indeed I was interested in taking up from where @QuentinJGMace left off, we might have some ideas for new data mixes, but so far nothing running. I'll try to find some time next week to have a look at that, thanks for sharing @athrael-soju and let us know how the run goes!

mlconti1 avatar Nov 21 '25 08:11 mlconti1

Hi, sorry for the delay, just came back from holidays too!

Indeed I was interested in taking up from where @QuentinJGMace left off, we might have some ideas for new data mixes, but so far nothing running. I'll try to find some time next week to have a look at that, thanks for sharing @athrael-soju and let us know how the run goes!

It plateaued before 1 epoch unfortunately. I've been having issues with the dataset and had to also update some files from colpali_engine to get it to run.

I recall not having any of these issues when I was experimenting with colintern.

Feel free to check my PR if you get a chance, but I'll try again soon.

athrael-soju avatar Nov 22 '25 11:11 athrael-soju