DewEfresh
DewEfresh
> Could you share changes to main.py, please? model_path = "./models/models--mustafaaljadery--gemma-2B-10M" #tokenizer = AutoTokenizer.from_pretrained(model_path) tokenizer = AutoTokenizer.from_pretrained(model_name, cache_dir="./models") model = GemmaForCausalLM.from_pretrained( #model_path, model_name, cache_dir="./models", torch_dtype=torch.bfloat16 )
i can't get past GemmaModel.forward() got an unexpected keyword argument 'cache_position'
you may want to check out https://github.com/IST-DASLab/qmoe. They created some custom cuda functions for sub 1-bit weights.
https://colab.research.google.com/drive/1nvzhy_PCBZ_r6dlvQv3GfweJsGlZHrNJ?usp=sharing