Dinesh Khandelwal
Dinesh Khandelwal
Hi, In Tutorial 1 target sequence is used at time of evaluation. At the time of evaluation I think we should stop generation of output sequence as soon as we...
I have prompt tuned the ``Falcon-7B-Instruct model``. Now, I want to perform inference using prompt tuned model in multi-gpu settings using ``accelerate``. I am using 2 A100 gpus and batch...
@ledw I was training the Bi-Encoder on Zero-shot EL dataset. I found out that the "load_entity_dict_zeshel" function in the "zeshel_utils.py" file uses only the first 256 characters of the entity...
I have a few questions. 1. I have used the data in the folder [https://github.com/facebookresearch/CodeGen/tree/main/data/test_dataset](https://github.com/facebookresearch/CodeGen/tree/main/data/test_dataset) learned BPE codes and vocab using Monolingual Functions mode. I want to know how to...
At line no. 1483 in the file codegen_sources/model/src/trainer.py. the code is `self.n_sentences += params.batch_size` I think it should be `self.n_sentences += len1.size(0)` https://github.com/facebookresearch/CodeGen/blob/6e93aca63e7bc77287c9965a5080456326651237/codegen_sources/model/src/trainer.py#L1483 With above bug notion of one epoch...
How do you select best checkpoint when fine-tuning on GSM8k ?
In the gsm8k script ([link](https://github.com/sail-sg/sdft/tree/main/scripts/gsm8k)), the distilled dataset is generated using fp16 precision, while the model is trained on this dataset using bf16. Shouldn't the precision format be consistent throughout...