turboderp comments

Results 180 comments of


                                            turboderp

Illegal memory access when using a lora

Is it possible they're taking so long to load because of the datatype? If Torch doesn't have an efficient bfloat16->float16 function, it might end up in some super-inefficient fallback routine....

Illegal memory access when using a lora

Well, if it works with 7b and 13b it's most likely related to GQA. Everything up until that 70b release has assumed that the number of heads is the same...

Illegal memory access when using a lora

>I can directly use the weights generated by qlora? If the weights are saved in float16, then yes, it doesn't have to match the model. And it should be possible...

Illegal memory access when using a lora

Well, it's not exactly a fix, cause it should really work with fused attn, but I'll get to that. What I need though is an example 70b LoRA I can...

YaRN Support

I'm still not sure what "dynamic" positional encodings actually means, and how you would use them with cached keys.

Illegal memory access when using a lora

>Unsupported tensor Dtype Have you updated ExLlama to the latest version? I only added bfloat16 very recently, probably hasn't made it into the library yet.

Illegal memory access when using a lora

A LoRA does add some overhead, especially when it's targeting all layers with rank-64 adapters. I really would caution everyone training these adapters not to crank up the rank thinking...

YaRN Support

>The correct implementation should cache the kv-embeddings before applying RoPE, as the RoPE embedding of every token changes when s changes. This is the part that doesn't make sense to...

YaRN Support

I'm not sure what those do exactly, especially since the default RoPE implementation already adapts to the hidden dimension of the model. But the hidden dimension of the model is...

Speed on A100

I haven't tested 70B on A100 before, but the speed is close to what I've seen for 65B on A100, so I think this is about expected, yes.