Vladimir Shakhov
Vladimir Shakhov
hi! seems not to happen with this opencv version ``` build> brew list --versions opencv3 opencv 3.3.0_3 ``` ``` build> ./src/Stringless -d 4 -s 200 -p ../ext/dlib/shape_predictor_68_face_landmarks.dat Camera output: 320x240...
did run the benchmark script on m1 mac, its up to 10 threads, orange is threadpool, blue is master-(when the branch was forked=eeaa7b0492fc79baab8bb1fe195d6c87159f2bd3) token time:  cant explain why we...
Thread pool Mac 7B threads 6 n_predict 64 ``` llama_print_timings: load time = 625.00 ms llama_print_timings: sample time = 46.18 ms / 64 runs ( 0.72 ms per run) llama_print_timings:...
activity monitor for thread pool master sampling profiler, just for reference thread pool: master
i was thinking how to easily show cpu time spent spinlocking vs being blocked on the threadpool - this change https://github.com/bogdad/llama.cpp/pull/7/files extracts the spinning portions of ggml_graph_compute and ggml_graph_compute_thread on...
very cool! agree, strange indeed, i would expect master to be faster than the thread pool, hm. Is it that doing nothing (no finalize) with spinlocks so much faster than...
oh, i missed that. the tldr: this is just an exploration how of how llama.cpp would behave if there were no busy waiting. was not supposed to be merged, because...
fwiw, the following diff makes the TestOps.test_inf_where pass with GPU=1 ``` --- a/tinygrad/runtime/ops_gpu.py +++ b/tinygrad/runtime/ops_gpu.py @@ -53,6 +53,10 @@ class CLProgram: def __init__(self, name:str, prg:str, binary=False, argdtypes=None, options=None): self.name, self.argdtypes,...
@fakufaku hey Robin, sure, did rebase, np, glad to be of help!