Mark O'Connor

http://yieldthought.com [email protected]

@tenstorrent

Results 28 comments of


                                            Mark O'Connor

[LLama3] Seqlen 128k for Llama3.2 1B and 3B - Reproduce on reference GPU

The Meta Llama 3 huggingface repo config.json was updated from 8 to 32 a couple of weeks after release: https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct/commit/c4219cc9e642e492fd0219283fa3c674804bb8ed ![Image](https://github.com/user-attachments/assets/76db48ce-51e9-4672-b743-eb8d63e5a947) It appears this was a bugfix and they did...

[LLama3] Seqlen 128k for Llama3.2 1B and 3B - Reproduce on reference GPU

[Notified the Meta repo](https://github.com/meta-llama/llama-models/issues/241) so they can fix this on their end too.

Tracy misses devices when logging the prefill variant of PagedUpdateCache

These are the two files showing the issue: [ops_perf_results_2025_11_24_09_41_18.csv.gz](https://github.com/user-attachments/files/23715194/ops_perf_results_2025_11_24_09_41_18.csv.gz) [ops_perf_results_2025_11_24_09_41_07.csv.gz](https://github.com/user-attachments/files/23715193/ops_perf_results_2025_11_24_09_41_07.csv.gz)

Tracy misses devices when logging the prefill variant of PagedUpdateCache

Interesting, thanks for the quick and thorough investigation Mo 🙏 So the TL;DR is that the PagedUpdateCacheOp is filtered _on the host side_ and breaks the otherwise solid metal assumption...

Tracy misses devices when logging the prefill variant of PagedUpdateCache

Seems like the only _correct_ way of using these ops is to stop passing mesh_coords and instead to mask out the page table with -1 for places that shouldn't be...

Tracy misses devices when logging the prefill variant of PagedUpdateCache

From @mtairum: > The way we're doing batch prefill is basically by concatenating the inputs over the sequence length. > So for 128 seqlen and batch-32 (currently what we support...

ttnn.execute_trace() hangs while executing captured trace of Deepseek.

@jvegaTT I'm taking over our side of this while @pprajapatiTT is away for the week, please let me know as soon as you find something out.

eltwise ops fail PCC when broadcasting [GS][WH]

I believe this is because broadcasting is not supported, we also see this bringing up llama 3.2 with the following test failing for all elementwise ops: ``` @pytest.mark.parametrize( "dtype", [...

‹
1
2
3