Elephant Panda comments

Results 109 comments of


                                            Elephant Panda

add dtype

Please support saving quantized models into 4bit, 2bit and 1bit. 1) To make smaller file sizes 2) To support quantized models.

[WIP] Csharp bindings for on-device training APIs

Has this been abandoned? Any news of c# training roadmap? Are there any builds with c# training?

Any guidance on creating smaller images? 256x256 or 384x384?

I've tried 384x384 and not got bad results. 256x256 though is bad.

A case for public access to (some of) the models

Free the weights! I will pay ten dollars for download costs.

Will it run on 3080 GTX 16GB VRAM?

There is a 7B weight model I heard. I mean the float16 version would then be about 14GB. I guess this is out of reach for the 3080. (Maybe it...

Will it run on 3080 GTX 16GB VRAM?

> I believe with CPU offloading, it should be possible. > > A related guie: https://huggingface.co/docs/accelerate/usage_guides/big_modeling Interesting... this might be useful for other models I'm running. I have a feeling...

Will it run on 3080 GTX 16GB VRAM?

> FlexGen 4bit is already here, so I think you can run a 20B model locally, let's wait and see... I've no doubt it can be run locally. The question...

Will it run on 3080 GTX 16GB VRAM?

> No chance Well it did #105 . (12GB RAM and 16GB VRAM) 😎 Lesson - don't listen do doubters. 😁

Inference on GPU

7B in float16 with be 14GB and if quantized to uint8 could be as low as 7GB. But on the graphics cards, from what I've tried with other models it...

Inference on GPU

> With KoboldAi I was able to run GPT J 6b on my 8gb 3070 ti by offloading the model to my ram How fast was it?