Vladimir Shakhov comments

Results 9 comments of


                                            Vladimir Shakhov

FaceDetection server crashing after approx. one minute of running on OSX

hi! seems not to happen with this opencv version ``` build> brew list --versions opencv3 opencv 3.3.0_3 ``` ``` build> ./src/Stringless -d 4 -s 200 -p ../ext/dlib/shape_predictor_68_face_landmarks.dat Camera output: 320x240...

Use Threadpool to schedule the work

did run the benchmark script on m1 mac, its up to 10 threads, orange is threadpool, blue is master-(when the branch was forked=eeaa7b0492fc79baab8bb1fe195d6c87159f2bd3) token time: ![master-blue-vs-threadpool-orange](https://user-images.githubusercontent.com/65818/232226028-6f7df7ca-a4ed-4e16-b331-b002af479dcd.png) cant explain why we...

threads: changing to a mutex/condvar based thread pool.

Thread pool Mac 7B threads 6 n_predict 64 ``` llama_print_timings: load time = 625.00 ms llama_print_timings: sample time = 46.18 ms / 64 runs ( 0.72 ms per run) llama_print_timings:...

threads: changing to a mutex/condvar based thread pool.

activity monitor for thread pool master sampling profiler, just for reference thread pool: master

threads: changing to a mutex/condvar based thread pool.

i was thinking how to easily show cpu time spent spinlocking vs being blocked on the threadpool - this change https://github.com/bogdad/llama.cpp/pull/7/files extracts the spinning portions of ggml_graph_compute and ggml_graph_compute_thread on...

threads: changing to a mutex/condvar based thread pool.

very cool! agree, strange indeed, i would expect master to be faster than the thread pool, hm. Is it that doing nothing (no finalize) with spinlocks so much faster than...

threads: changing to a mutex/condvar based thread pool.

oh, i missed that. the tldr: this is just an exploration how of how llama.cpp would behave if there were no busy waiting. was not supposed to be merged, because...

Tensor.where with infinity values returns NAN

fwiw, the following diff makes the TestOps.test_inf_where pass with GPU=1 ``` --- a/tinygrad/runtime/ops_gpu.py +++ b/tinygrad/runtime/ops_gpu.py @@ -53,6 +53,10 @@ class CLProgram: def __init__(self, name:str, prg:str, binary=False, argdtypes=None, options=None): self.name, self.argdtypes,...

MicrophoneArray.append(MicrophoneArray), do not crash when signal exists

@fakufaku hey Robin, sure, did rebase, np, glad to be of help!