matrixssy issues

Results 6 issues of


                                            matrixssy

Missing tokenizer when use vllm

``` File "/home/paas/vllm/vllm/engine/llm_engine.py", line 222, in _init_tokenizer self.tokenizer: BaseTokenizerGroup = get_tokenizer_group( File "/home/paas/vllm/vllm/transformers_utils/tokenizer_group/__init__.py", line 20, in get_tokenizer_group return TokenizerGroup(**init_kwargs) File "/home/paas/vllm/vllm/transformers_utils/tokenizer_group/tokenizer_group.py", line 23, in __init__ self.tokenizer = get_tokenizer(self.tokenizer_id, **tokenizer_config) File...

Support Mixtral 8*7B MOE

Support Mixtral 8*7B MOE model structure and weight converter from huggingface. You can refer to this script to convert the huggingface weight to megatron: ```shell python tools/checkpoint/util.py --model-type GPT --loader...

[ENHANCEMENT] Do you have a plan that supports Mixtral 8x7B?

stale

I wonder why starchat model size is half of starcoder? It is that save in FP16?

Did Mistral-7B-Instruct-v0.2 use Sliding Window Attention (SWA)?

I have been fine-tuning Mistral-7B-Instruct-v0.2 recently and I noticed that when I don't use SWA and train with a sequence length of 32K, the initial loss is unusually high (6.0)....

convert_lora_safetensor_to_diffusers.py seems like not working.

I follow the step of guidance"Lora model only" and run convert_lora_safetensor_to_diffusers.py successed. However, the pics generated seems with no difference of original SD 1.5. ![image](https://github.com/haofanwang/Lora-for-Diffusers/assets/55280213/eed70b40-19bc-42d9-956b-33ddb2e617ba)