whatdhack
whatdhack
### System Info latest , linux ### Information - [X] The official example scripts - [ ] My own modified scripts ### 🐛 Describe the bug Looking through the llama-recipes,...
A simple 101 example to setup, run and visualize/analyze a PyTorch run would be very useful.
Is there a pure PyTorch implementation using torch.distributed.tensor.parallel instead of fairscale.nn.model_parallel ? Fairscale package looks a bit old with not much activity lately. Also, it will be good to have...
Could not find any tag or branch corresponding to different versions of Llama. Is there better a way to identify the different versions than by date ?
I am trying to understand equation 7 in the [DeepSeek-V2 tech report](https://arxiv.org/html/2405.04434v5). . Here are the confusions I am having. 1. qti, kti, and vti are row vectors ? shapes...
Blackwell support for Distributed GEMM
### Which component has the problem? CuTe DSL ### Bug Report **Describe the bug** Simple low precision arithmetic not working. torch ``` import torch def add_precision_sweep (): precisions = [torch.float16,...
### Which component has the problem? CuTe DSL ### Bug Report **Describe the bug** pip install -e . still creating 4.2.0.0 However pip install -e . in python/CuTeDSL creates 4.3.0.dev0...
### Which component has the problem? CuTe DSL ### Bug Report **Describe the bug** with nvidia-cutlass and nvidia-cutlass-dsl 4.2.0.0 ``` python cutlass/examples/python/CuTeDSL/blackwell/tutorial_gemm/fp16_gemm_1.py nvidia_cutlass_dsl/python_packages/cutlass/cute/nvgpu/tcgen05/mma.py", line 153, in __post_init__ raise OpError( cutlass.cute.nvgpu.common.OpError:...
Looking for a sgemm example. Any one knows where to find one ?