Add ColQwen3 and ColQwen3MoE
- add ColQwen3 + ColQwen3MoE wrappers around Qwen3VL backbones
- add training entrypoint for Qwen3
- updated transformers to v4.57.1 to access the Qwen3-VL backbones
I am currently unable to test the training script due to GPU constraints, this is mostly a draft implementation done with codex. MoE processing is currently the same with dense, I kept the implementation seperate to leave room for the implementations to diverge later.
I forgot to tag @ManuelFay
Surely pretty nice - but we can't merge things that have not been tested through training !
@QuentinJGMace has been training Qwen3VL models recently, there is a branch open already. I'll let this one open so he can see cherry pick what he wants from both branches ! Thanks for the contrib!
Hey @selimcavas ! thanks for the contrib.
I'm not sure about implementing support for MoE models as I don't think we'll train one (and none exists at the moment). But if one is trained one day I'll be happy to merge the code to support it.
As @ManuelFay said, i've been experimenting a bit with qwen3, as I'm soon on (long) hollidays i'm not sure when a new model will come out, but one should be eventually :)
Maybe we can pass this off to @mlconti1 ?
Okay I might try training a model by adjusting the params I currently have a rtx5090, approximately how many gpu hours (H100) does it take to train a full model such as ColQwen2.5? I planned to train the Qwen3 VL 2B model
Okay I might try training a model by adjusting the params I currently have a rtx5090, approximately how many gpu hours (H100) does it take to train a full model such as ColQwen2.5? I planned to train the Qwen3 VL 2B model
I'm casually training Colqwen3-vl-2B on an RTX 5090. I'm expecting it to take roughly 16 hours, with checkpoints every 250 steps and tracking via wandb.
PR in my fork, if you want to have a look: https://github.com/athrael-soju/colpali/pull/6/files
I think it's got potential of being a great colpali model and the recipe is already there from previous models, so why not?
Hi, sorry for the delay, just came back from holidays too! Indeed I was interested in taking up from where @QuentinJGMace left off, we might have some ideas for new data mixes, but so far nothing running. I'll try to find some time next week to have a look at that, thanks for sharing @athrael-soju and let us know how the run goes!
Hi, sorry for the delay, just came back from holidays too!
Indeed I was interested in taking up from where @QuentinJGMace left off, we might have some ideas for new data mixes, but so far nothing running. I'll try to find some time next week to have a look at that, thanks for sharing @athrael-soju and let us know how the run goes!
It plateaued before 1 epoch unfortunately. I've been having issues with the dataset and had to also update some files from colpali_engine to get it to run.
I recall not having any of these issues when I was experimenting with colintern.
Feel free to check my PR if you get a chance, but I'll try again soon.