colpali Inconsistent training results with the same random seed

Hi, I've observed that when running training multiple times with the same random seed, the results are not fully reproducible — there's noticeable variation in the final performance metrics.

Have you observed this issue in your experiments? Do you have recommendations for achieving better reproducibility?

Thanks in advance for any insights!

Aug 21 '25 03:08 VickiCui

Hey - we did notice slight variations but not huge... You set the numpy and torch seed , set all the flags, train on same hardware, with same batch size and also eval with a consistent batch size ? Do you merge the loras before eval ? What magnitude are we talking and on which evals ?

Oct 03 '25 07:10 ManuelFay

Hi, I’m encountering a similar situation. I fix the random seed before launching training in trainer script and run with the same config and the same hardware, but still see some variance in the final metrics (e.g., one run averages around 80% , another around 81% ). For reference, this is how I currently set up the seeds and deterministic flags (multi-GPU, rank-aware):

os.environ["PYTHONHASHSEED"] = str(GLOBAL_SEED)

def random_seed(seed=GLOBAL_SEED, rank=0):
    set_seed(seed + rank)  # transformers (torch, numpy, random, torch.cuda)
    torch.manual_seed(seed + rank)
    torch.cuda.manual_seed(seed + rank)
    torch.cuda.manual_seed_all(seed + rank)
    np.random.seed(seed + rank)
    random.seed(seed + rank)

    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    torch.use_deterministic_algorithms(True, warn_only=True)

rank = int(os.environ.get("LOCAL_RANK", 0))

Is this difference avoidable, or is this level of fluctuation normal?

Dec 01 '25 11:12 tausenden

您好，您的邮件我已经收到，我会第一时间处理。

Dec 01 '25 11:12 VickiCui

Is this with Lora ? Or without ?

Dec 01 '25 11:12 ManuelFay

Is this with Lora ? Or without ?

Dec 01 '25 11:12 ManuelFay

Yes, using a similar Lora setup in ColPali-v1.3, we evaluate the model using the ViDoRe-benchmark API, Lora adapter is not merged before eval.

Dec 01 '25 14:12 tausenden

Yes, using a similar Lora setup in ColPali-v1.3, we evaluate the model using the ViDoRe-benchmark API, Lora adapter is not merged before eval.

Any specific model? It looks like there's a colqwen3 model by TomoroAI pushed recently and didn't report any similar issue, but on first look they haven't used LoRa: https://huggingface.co/TomoroAI/tomoro-colqwen3-embed-4b. I could be wrong.

Dec 01 '25 15:12 athrael-soju