Mark O'Connor
Mark O'Connor
At least one of the files in 68MB file at https://drive.google.com/file/d/1WYfgr31T-PPwMcxuAq09XZfHQO5Mw8fE/view?usp=sharing is truncated in the middle of the list of intervals with several lines missing and no closing "
Mistral attention (branch: mistral-fast-attention d3b355a6a6e19827517deb5fdb3a91c03f079ea6) has very slow tensor deallocate calls. To reproduce on the above checkout: `pytest models/demos/mistral7b/tests/test_mistral_attention.py` ``` ... Test | INFO | Small tensor algorithm selected 22...
Using a shared_ptr in FreeList is unreasonably slow because it uses atomic operations for all the reference counts and these get hit on every time while iterating through the list....
The `use_program_cache` pytest fixture only enables the program cache if combined with `device` or `all_devices` but silently does nothing when used with `t3k_device_mesh`. This was... interesting to track down.
**Describe the bug** ``` tt_input = tt_input.reshape(1, 2048, 4, 128) tt_output= ttnn.transpose(tt_input, 1, 2) ``` gives 0.0 PCC compared to Torch: ``` torch_ref = torch_input.view(1, 2048, 4, 128) torch_ref =...
**Describe the bug** WH transpose fails if W is an unaligned value such as 5. **To Reproduce** Can be trivially reproduced by adding `[[1, 1024, 5, 1280]], # Non page-aligned`...
**Describe the bug** Large DRAM-sharded matmuls cause wormhole to hang after a few iterations. Reducing the number of columns in the weight matrix to below 32k seems to work around...
### Ticket Fixes #15737, improves accuracy. ### Problem description Meta's reference model incorrectly used a rope scale_factor of 8 for the 1B and 3B models, it should be 32 as...
### Ticket #21822 ### Problem description Added support for dense Qwen3 models, updated PERF.md and README.md with values for Qwen3-32B but runs all Qwen3 dense models: - Qwen3-0.6B - Qwen3-1.7B...
When using tracy on 2x Galaxy for DeepSeek bringup (8x8 device grid) we get a tracy file for each host. The `PagedUpdateCache` op corresponding to prefill `PagedUpdateCacheOpType::FILL` only appears on...