Shaomu comments

Results 9 comments of


                                            Shaomu

multilingual-bert issue?

> Thanks for your interest in my work! > > As a sanity check step, can you try training `bert-base-multilingual-uncased` with grad cache _disabled_ and compare memory usage against some...

Another question: Does the `train_dense_retriever` support multi-gpu training as well? Coz m-bert requires more memories, I think using multiple-GPU might helps. I tried to use `python -m torch.distributed.launch --nproc_per_node=4 train_dense_retriever.py`...

encountered bugs when load the env

> Could you try sharing the full log? > > Meanwhile here is a checklist for things to test out: > > * Make sure you have a valid display...

encountered bugs when load the env

> Could you try sharing the full log? > > Meanwhile here is a checklist for things to test out: > > * Make sure you have a valid display...

Support for vectorized/batch inference?

> Have you checked out the `get_batch_scores` method yet? It sounds like this might be what you're looking for. I think `get_batch_scores` is to compute the bm25 scores between one...

Support for vectorized/batch inference?

> @Smu-Tan @puzzlecollector were you able to find an alternative to this implementation to speed up the process? checkout Pyserini.

[BUG] Zero2 offload overflow

three "solutions" work for my case: 1. use zero2 + bf16, instead of zero2 offload + bf16; 2. use fp16 than bf16 (works for zero2 offload); 3. change the source...

DeepSpeed support for Full Finetuning - FSDP performance is not as good as Deepspeed

@waterluck Not sure if it helps, but probably check [this ](https://huggingface.co/blog/zh/deepspeed-to-fsdp-and-back).

TimeOutError when ray::WorkerDict.critic_init_model

Same problem when using multi-node, the job stuck when initializing critic model: `[36m(WorkerDict pid=826494)[0m Qwen2ForTokenClassification contains 13.99B parameters [36m(WorkerDict pid=826494)[0m Before critic FSDP, memory allocated (GB): 0.00, memory reserved (GB):...