kormann comments

Results 23 comments of


                                            kormann

Longer and infinite output

it seems there are two possible solutions **swap idea:** while using the model take half of the input tokens and start building inference over them in the background. When out...

Longer and infinite output

@tjohnman first of im just a layman too:) 'building inference' is just me trying to describe that you need to feed the previous tokens into the model again. I understand...

whisper

when the context gets full during recording i create new context with only text from current frame. I wanna make sure to only do this once to avoid loop. this...

working on speed optimization rn now at about 8.5x realtime on single GPU whisper large ``` TIMESTAMPS=1 MODEL=large python examples/whisper.py https://media.blubrry.com/takeituneasy/content.blubrry.com/takeituneasy/lex_ai_balaji_srinivasan.mp3 ```

generalize vec->store rule [run_process_replay] [no_assert]

generalized to vectorize(gep(val) * n ) -> val

lstm training slo repro

some more digging i found that clangcompiler seems to produce n programs of size n

lstm training slo repro

```python from tinygrad import nn, Tensor, Device from tinygrad.engine.realize import method_cache from tinygrad.helpers import DEBUG T = 80 Device.DEFAULT = "CLANG" DEBUG.value = 3 method_cache.clear() x = Tensor.rand(2) for _...

lstm training slo repro

create_schedule creates ast that dont recompute over and over As i understand because 1. kernel can only store once 3. reduce op must be last operation? ``` python DEVICE =...

keep track of linearizer times in process_replay

will only print if difference is more than 10% and more than 10 ms

keep track of linearizer times in process_replay

time results are super off for some kernels must be missing some caching or optimization