matrixssy
matrixssy
``` File "/home/paas/vllm/vllm/engine/llm_engine.py", line 222, in _init_tokenizer self.tokenizer: BaseTokenizerGroup = get_tokenizer_group( File "/home/paas/vllm/vllm/transformers_utils/tokenizer_group/__init__.py", line 20, in get_tokenizer_group return TokenizerGroup(**init_kwargs) File "/home/paas/vllm/vllm/transformers_utils/tokenizer_group/tokenizer_group.py", line 23, in __init__ self.tokenizer = get_tokenizer(self.tokenizer_id, **tokenizer_config) File...
Support Mixtral 8*7B MOE model structure and weight converter from huggingface. You can refer to this script to convert the huggingface weight to megatron: ```shell python tools/checkpoint/util.py --model-type GPT --loader...
I have been fine-tuning Mistral-7B-Instruct-v0.2 recently and I noticed that when I don't use SWA and train with a sequence length of 32K, the initial loss is unusually high (6.0)....
I follow the step of guidance"Lora model only" and run convert_lora_safetensor_to_diffusers.py successed. However, the pics generated seems with no difference of original SD 1.5. 