alph4b3th

Results 19 comments of alph4b3th

the problem is not serge, it's in llama.cpp. Something doesn't work well with docker, I saw 'npx dalai serve' run and the model responded in 3-5 seconds.. with docker here...

I installed it outside of docker and did not get different results from those already mentioned above. What is it? Because I've seen some people running alpaca 7B and it...

I discovered that the problem is in the compilation of the new version of llama.cpp, in which the parameters passed by the compiler are making the software slower, and that...

you can try to read the [thread](https://github.com/ggerganov/llama.cpp/issues/603)

> I'm pleased to report that as of the latest commit ([cf84d0c](https://github.com/nsarrazin/serge/commit/cf84d0c7f52454657457dab57cbd6325777f58cc)) the performance is much better, at least on my CPUs, which were impossibly slow before. > > cc...

> > what is your hardware? how long does it take to answer you? I'm running on an amd epyc vps with 6 cores and 16gb of ram and it...

> > could you explain to me in detail how bitcoin works? I would like a technical article in a language for laymen. > > 4 threads, using 13B model...

> in reply from: @gotzmann > > Implementing INT4/INT8 quantization and using AVX instructions can be challenging, mainly due to the limitations of INT8 multiplication instructions. However, here are some...

To use AVX2 instructions in Go, you can use assembly language and the go:generate directive. Here is an example of how to perform INT8 vector multiplication using AVX2 instructions in...