DenceChen
DenceChen
@sainttelant Could you please tell me what flags you used when training? my loss arround 2.0
@hellochick can you show your train args? your read.me do not have training introduction
keras_contrib crf not support masking
I have the same problem
this framework need a lot GPU memory, so if you optimization this part will be a prefect framework
大家能分享下自己的训练脚本不,我这个脚本切换a3b模型就跑不动了,我这是8卡A800的啊,每卡有80G显存的,醉了 accelerate launch \ --main_process_port 25515 \ --config_file ./scripts/config.yaml \ ./src/train.py \ --stage sft \ --do_train True \ --model_name_or_path ${model_path} \ --dataset $train_ds \ --dataset_dir /opt/nas/p/learning_platform/zouyapeng/docsum/LLaMA-Factory/data \ --template qwen3 \...
tokenizer_config.json -> chat_template -> {%- if enable_thinking is not defined or enable_thinking is false %}
151648: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151649: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True) 是不是special被设置为True的原因?我看天然带这个标签的模型special设置的为False
@Alice1998 @qubingxin @katouHui 我解决了,不要加special token 加普通token /root/dence/gendata/LLaMA-Factory/src/llamafactory/model/patcher.py ``` def patch_tokenizer(tokenizer: "PreTrainedTokenizer", model_args: "ModelArguments") -> None: if "PreTrainedTokenizerBase" not in str(tokenizer._pad.__func__): tokenizer._pad = MethodType(PreTrainedTokenizerBase._pad, tokenizer) if model_args.model_max_length is not None and...