alsbhn
alsbhn
I was wondering how it is possible to fine-tune the pre-trained model on a smaller dataset? What about the implementation of coverage mechanism during the fine-tuning? Do you propose specific...
I see in the code that two models (distilbert-base-uncased, msmarco-distilbert-margin-mse) are recommended to use as initial checkpoints. I tried to use other Sentence-Transformers models like all-mpnet-base-v2 but it didn't work....
Is there a way to train GPL model with multi gpu? If yes can that help for training with larger batches?