pinchedsquare
Results
2
comments of
pinchedsquare
@SunMarc Just curious fundamentally why is `load_and_quantize_model()` using more memory than `AutoModelForCausalLM.from_pretrained(self.model_name, load_in_8bit=True, device_map="auto")` ? In the following 2 examples, first one succeeds and 2nd one fails with OOM on...
Will await your response. Thanks.