Wasi Ahmad
Wasi Ahmad
What is document id? Do you mean we need to provide each document/summary in separate files? I have all the summaries in one file and all the gold summaries in...
Thank you for pointing this out. It is a bug as we can see [here](https://github.com/huggingface/transformers/blob/main/src/transformers/models/plbart/tokenization_plbart.py#L90), instead `FAIRSEQ_LANGUAGE_CODES` should be defined as: ``` FAIRSEQ_LANGUAGE_CODES = { "base": ["__java__", "__python__", "__en_XX__"], "multi":...
@gchhablani Can you help resolving the bug? The [FAIRSEQ_LANGUAGE_CODES](https://github.com/huggingface/transformers/blob/main/src/transformers/models/plbart/tokenization_plbart.py#L90) should be defined as: ``` FAIRSEQ_LANGUAGE_CODES = { "base": ["__java__", "__python__", "__en_XX__"], "multi": ["__java__", "__python__", "__en_XX__", "__javascript__", "__php__", "__ruby__", "__go__"], }...
Resolved with this PR (https://github.com/huggingface/transformers/pull/19980).
We have resolved the issue by re-crawling the dataset. We released the new dataset along with other updates.
Can you please paste the entire error log? We are unable to reproduce the error. **[Update]** We were able to reproduce the error. ```Traceback (most recent call last): File "/home/ec2-user/CodeSage/evaluation/defect_prediction.py",...
For the baseline models, you can take a look at the [CodeBERT](https://github.com/microsoft/CodeBERT/tree/master) repository which is the basis of our code for finetuning CodeSage and evaluation. We do not plan to...
Is it intended to use as instruction following LLM?
Yes, when we finetuned PLBART on the code refinement task, we didn't use the language token. So, I am not sure why we need the token to generate refined java...