ZHU QIHAO
ZHU QIHAO
好的 十分感谢
You can see "https://github.com/EleutherAI/gpt-neox/blob/FIM-clean/megatron/data/gpt2_dataset.py#L339". We use the character-level disruption.
切割了 context(4096)之后 在document级别做的fim
sft model still can work with " const printHelloWorld = () => {} ", donot add ### instruction.
You can directly load the model from GGUF. The GGUF models are here: https://huggingface.co/TheBloke/deepseek-coder-6.7B-instruct-GGUF
由于jinjia2版本的问题,原来的tokenizer_config.json里面的代码有些问题,现在已经修复并上传,可以下载重试一下这个case
char_voc.pkl and nl_voc.pkl are used to tokenize code for code readers.
Maybe "cuda out of memory"
If you want to change the batch size, you need to change the number in the dict "args". If you want to use multiple GPUs, you need to modify "model...
How many GPUs do you use? And the batch size?