Support threading on all platforms
So far multithreading only works on Linux. With a few tweaks in load balancing I have planned, this should work (as in guaranteed) on all platforms.
Really, this is a core feature that should work everywhere.
SO_REUSEPORT on macOS absolutely does not work for load balancing. Even if you can have 4 listen sockets and can poll for accept with kqueue on them, only the first created socket will get events. It doesn't matter in what order you poll, it always only goes to the first created listen socket. As soon as you close that listen socket, it starts coming to the secondly created socket. So macOS does not really support this threading solution.
FreeBSD has SO_REUSEPORT_LB which works, and Linux has SO_REUSEPORT.
Windows, Linux and FreeBSD should have the features needed for this threading but macOS does not
Just a theoretical question, how was nodejs able to make this work on macOS?
maybe relevant?
https://just.billywhizz.io/blog/on-javascript-performance-01/
this guy says when he switched from threading V8 to a multi-process strategy, he saw a huge perf boost:
I eventually tracked the issue down to this code. It seems that every time we call ArrayBuffer->GetBackingStore() v8 uses a mutex to avoid races on reading the backing memory. I am guessing this is because the GC runs on a separate thread but need to investigate further. I raise the issue on the v8-users google group and am hoping I can put together a reproducible issue report for the v8 team to look at.
At this point, I was really scratching my head as to what I could do but while testing on the packet server I noticed that the issue got worse the more threads I used. The initial submission was spawning a thread for each server instance so I decided to try using processes instead of threads.
This turned out to be the breakthrough I was hoping for and I saw a huge improvement on the packet server over the threaded approach. I am still not too sure why this is but my guess would be the v8 heap and/or GC are shared across the threads when using a thread for each v8 Isolate which means lots of contention on those mutexes reading from the ArrayBuffers.
Node.js has Process forks with Cluster and Threading with Worker_Threads, neither of these work currently with uWS.js on Windows and I guess MacOS, only works on Linux, though they work with built-in Node.js http so should be possible
uWS prefers using modern kernel features for this. For instance, in FreeBSD there is https://reviews.freebsd.org/D11003. Linux has similar thing which we use.
You can make it work with old common unix forking but that has problems of "thundering herd" and/or 1 single accept thread being overloaded. What we use here on Linux is way better than what Node.js uses, since it is a kernel feature specifically intended for load balancing.
@alexhultman sure is, this sounds like a sane decision, at least for Linux. However IMHO it would be great if the macOS version would support some basic form of clustering as a fallback.
Probably just requires some effort to add for each platform, Windows has these IO Ports https://docs.microsoft.com/en-us/windows/win32/fileio/i-o-completion-ports
Yep and it can be improved but requires a few changes and I'm currently putting focus on QUIC
it do not work now?
There is working multithgreading on all platforms now via the LoadBalancer exampler. This example will be cleaned up into a helper uWS::LocalCluster and moved into HelloWorldThreaded in next release. Closing this issue as it (at least) works now