alph4b3th comments

Results 19 comments of


                                            alph4b3th

too slow?

the problem is not serge, it's in llama.cpp. Something doesn't work well with docker, I saw 'npx dalai serve' run and the model responded in 3-5 seconds.. with docker here...

too slow?

I installed it outside of docker and did not get different results from those already mentioned above. What is it? Because I've seen some people running alpaca 7B and it...

too slow?

I discovered that the problem is in the compilation of the new version of llama.cpp, in which the parameters passed by the compiler are making the software slower, and that...

too slow?

you can try to read the [thread](https://github.com/ggerganov/llama.cpp/issues/603)

> I'm pleased to report that as of the latest commit ([cf84d0c](https://github.com/nsarrazin/serge/commit/cf84d0c7f52454657457dab57cbd6325777f58cc)) the performance is much better, at least on my CPUs, which were impossibly slow before. > > cc...

too slow?

> > what is your hardware? how long does it take to answer you? I'm running on an amd epyc vps with 6 cores and 16gb of ram and it...

too slow?

> > could you explain to me in detail how bitcoin works? I would like a technical article in a language for laymen. > > 4 threads, using 13B model...

How to implement INT4/INT8 quantization and optimal way to use AVX instructions?

> in reply from: @gotzmann > > Implementing INT4/INT8 quantization and using AVX instructions can be challenging, mainly due to the limitations of INT8 multiplication instructions. However, here are some...

How to implement INT4/INT8 quantization and optimal way to use AVX instructions?

To use AVX2 instructions in Go, you can use assembly language and the go:generate directive. Here is an example of how to perform INT8 vector multiplication using AVX2 instructions in...