Bruce D'Ambrosio
Bruce D'Ambrosio
this answers my question (issue #48) as well? Any ideas on timeline?
That would be GREAT, I haven't had much luck. I do have a llms compatible server with access between encode and generate, and streaming access between generate a decode, if...
me too. sigh
I'm getting RuntimeError: shape '[1, 34, 64, 128]' is invalid for input of size 34816 for 70B chat, 7 and 13 load fine.
hmm I installed 4.31.0, I saw 4.31.0-dev in the config file, guess I'll try that.
there is no 4.31.0.dev0 available: python -m pip install --upgrade transformers== 4.31.0.dev0 RROR: Could not find a version that satisfies the requirement transformers==4.31.0.dev0 (from versions: 0.1, 2.0.0, 2.1.0, 2.1.1, 2.2.0,...
import torch import transformers from transformers import ( AutoTokenizer, BitsAndBytesConfig, AutoModelForCausalLM, ) from alphawave_pyexts import serverUtils as sv model_name = '/home/bruce/Downloads/llama/llama-2-70b-chat' print(f"Loading {model_name}") model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, load_in_4bit=True, device_map="auto", trust_remote_code=True)...
I pass the pipeline to my utility server code that works with literally dozens of other models, including llama-2-7/13. I'll add some standalone test just to make sure that isn't...
loading re-downloaded model now...
same error. checked pytorch also, latest verion  My conda env has lots of stuff, maybe I'll try a fresh one.. Ubuntu 22.04, up to date, btw. Python 3.11.3