CUDA build file size
Something happened to the size of the cuda-build - it absolutely exploded. Is it expected/intended to be that big with the latest release?
That should be due to optimizations/specializations done upstream in llama.cpp.
But most of those should be unused here. It's also an NxM type problem, because it emits code for N quantization specializations for M cuda architectures.
Hmm, ok - but it really would be great, if it could be addressed at least partially since a 7.5 times increase in size might be a bit unreasonable. (Which is mostly due to my practical problem of it being too big to distribute in the c# ecosystem now :p).
Can be a help #395