Wasi Ahmad comments

Results 12 comments of


                                            Wasi Ahmad

IndexError: tuple index out of range

What is document id? Do you mean we need to provide each document/summary in separate files? I have all the summaries in one file and all the gold summaries in...

Any plans to release a sample code for MINI-LM distillation?

Missing "java" token in Hugging Face Tokenizer

Thank you for pointing this out. It is a bug as we can see [here](https://github.com/huggingface/transformers/blob/main/src/transformers/models/plbart/tokenization_plbart.py#L90), instead `FAIRSEQ_LANGUAGE_CODES` should be defined as: ``` FAIRSEQ_LANGUAGE_CODES = { "base": ["__java__", "__python__", "__en_XX__"], "multi":...

Missing "java" token in Hugging Face Tokenizer

@gchhablani Can you help resolving the bug? The [FAIRSEQ_LANGUAGE_CODES](https://github.com/huggingface/transformers/blob/main/src/transformers/models/plbart/tokenization_plbart.py#L90) should be defined as: ``` FAIRSEQ_LANGUAGE_CODES = { "base": ["__java__", "__python__", "__en_XX__"], "multi": ["__java__", "__python__", "__en_XX__", "__javascript__", "__php__", "__ruby__", "__go__"], }...

Missing "java" token in Hugging Face Tokenizer

Resolved with this PR (https://github.com/huggingface/transformers/pull/19980).

Absent new_lines and indentation in python data

We have resolved the issue by re-crawling the dataset. We released the new dataset along with other updates.

defect prediction task error

Can you please paste the entire error log? We are unable to reproduce the error. **[Update]** We were able to reproduce the error. ```Traceback (most recent call last): File "/home/ec2-user/CodeSage/evaluation/defect_prediction.py",...

Baselines for CodeSage

For the baseline models, you can take a look at the [CodeBERT](https://github.com/microsoft/CodeBERT/tree/master) repository which is the basis of our code for finetuning CodeSage and evaluation. We do not plan to...

Why DeepSeek-Coder-v2 236B is not trained with FIM objective?

Is it intended to use as instruction following LLM?

HuggingFace Checkpoint Configurations

Yes, when we finetuned PLBART on the code refinement task, we didn't use the language token. So, I am not sure why we need the token to generate refined java...