zygi
zygi
> amazing thanks, lots going on right now but will defo try to merge in the next couple of days. When i played with it in the past i didn't...
**CPU benchmarks**: there's a massive improvement. Time to generate an example sentence drops from ~30min to 2min. Tested on an i9-13900K, with the following code: ```python from bark import SAMPLE_RATE,...
Oops, I was misreading the code and focused on the wrong path. The CUDA.jl behavior is defined [here](https://github.com/JuliaGPU/CUDA.jl/blob/d79adbfd090b0e51ccaf4c74710eaa610e0bf998/lib/cublas/wrappers.jl#L903) since we're using gemmEx. In that case we probably need much less...
Thanks for the reply! > No, that just changes the computational domain sorry for being unclear, this is exactly what I had in mind. The desired behavior is to read...
> Ah OK, I was confused by the mention of cublasSgemm, where AFAIK you can't do this (the input/output types are Float32). Sorry, yes, I meant `cublasSgemmEx` there > Does...