Pam
Pam
After some benchmarking I've found that LLaMA 13B with `load_in_8bit=True` actually gains no noticable performance boosts from both `torch.compile` and sdp attention hijacking. But the gain from `llm_int8_threshold=0` is 40%!...
Extreme values for `llm_int8_threshold` like 60 or 1000 give speedup as well. As stated in [here](https://huggingface.co/docs/transformers/main/main_classes/quantization#play-with-llmint8threshold), this threshold is responsible for swapping operations between int8 and fp16, which seems to...
> Also, how do you install pytorch 2.0? If I try doing so with conda, then `pip install -r requirements.txt` drags in 1.x again. I personally use good ol' venv...
@Sakura-Luna It can't, images with same seeds are still non-deterministic. It's better (at least for inference tasks) as it doesn't require `xformers` library for users with pytorch 2.x.
> Essentially, PyTorch 2.x users will not need xformers at all. I'm getting strange runtime exception when training with and without mem efficient sdp, I've mentioned it [here](https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/6932#discussioncomment-5221340). I believe...
> Maybe you should separate `sdp` and `sdp-no-mem`, it helps to simplify usage parameters. I've made arguments similar to existing `--xformers` and `--xformers-flash-attention`, but perhaps you're right.
> It just looks like some devices are being ignored, have you tried disabling flash attention to fix it? Yep, works fine but it's noticeably slower than xformers.
I think it's a little bit early to add these samplers to the upstream. There are some possible issues with the original implementation (see https://github.com/Koishi-Star/Euler-Smea-Dyn-Sampler/issues/5#issue-2232652349 and https://github.com/Koishi-Star/Euler-Smea-Dyn-Sampler/issues/7#issue-2232823982). Plus, as [implied...
I think I should turn `llm_int8_threshold` into startup argument, so everyone could experiment with the threshold since it has various impact on performance/memory on different configurations.
Probably related to #7