Dango233 comments

Results 14 comments of


                                            Dango233

The link to the pre-trained model is not working

Same here. Would be great if youcould help update the pretrained model

Up to 2x speedup on GPUs using memory efficient attention

Great work! @MatthieuTPHR I was able to get a +60% speed up on a A40 on unet But this seems to **break torch.jit.trace**, I'm getting this error: `RuntimeError: unsupported output...

Up to 2x speedup on GPUs using memory efficient attention

Understood if we want gradient computed. For forward pass only jit, will fixing the output type of the op work? The int output of the op is where jit breaks....

Windows build fail

I have exactly the same problem... Any clue?

Fix mid block checker to fix custom model loading

I used a model trained under the CompVis/Stable-diffusion format. I got it converted to diffusers format using this conversion script: https://github.com/huggingface/diffusers/blob/main/scripts/convert_original_stable_diffusion_to_diffusers.py

Fix mid block checker to fix custom model loading

Seems that the config saved using diffusers' `save_pretrained` method will have this problem

model architectures and pretrained models to support

> https://github.com/mindspore-ai/models/tree/master/research/mm/wukong

Feature Request: Prefix assistant answer

+1 for this - not supporting prefix in `/v1/chat/completion` for me is the largest gap between llama.cpp vs common API providers & lmstudio...

Add Holocine Structured Prompt & Shot Attention Support

Probelm fixed. > > @Dango233 Hello, may I ask what this prompt means: "hot attention enabled: running without shot embedding." it's a debug info - if a ckpt comes with...

Add Holocine Structured Prompt & Shot Attention Support

Oh hmmm. Try not to use torch compile for now. I'll look at if we can use compile at all.