DewEfresh

Results 4 comments of DewEfresh

> Could you share changes to main.py, please? model_path = "./models/models--mustafaaljadery--gemma-2B-10M" #tokenizer = AutoTokenizer.from_pretrained(model_path) tokenizer = AutoTokenizer.from_pretrained(model_name, cache_dir="./models") model = GemmaForCausalLM.from_pretrained( #model_path, model_name, cache_dir="./models", torch_dtype=torch.bfloat16 )

i can't get past GemmaModel.forward() got an unexpected keyword argument 'cache_position'

you may want to check out https://github.com/IST-DASLab/qmoe. They created some custom cuda functions for sub 1-bit weights.

https://colab.research.google.com/drive/1nvzhy_PCBZ_r6dlvQv3GfweJsGlZHrNJ?usp=sharing