whatdhack issues

Results 21 issues of


                                            whatdhack

Is it possible to run Llama 3.1 without HF libraries ?

### System Info latest , linux ### Information - [X] The official example scripts - [ ] My own modified scripts ### 🐛 Describe the bug Looking through the llama-recipes,...

Documentation on how to use Chakra with PyTorch

A simple 101 example to setup, run and visualize/analyze a PyTorch run would be very useful.

question

Fairscale to tensor.parallel upgrade for reference_impl of model ?

Is there a pure PyTorch implementation using torch.distributed.tensor.parallel instead of fairscale.nn.model_parallel ? Fairscale package looks a bit old with not much activity lately. Also, it will be good to have...

What hash tags correspond to LLama 1, 2, 3 and 3.1 ?

Could not find any tag or branch corresponding to different versions of Llama. Is there better a way to identify the different versions than by date ?

Equation 7 in DeepSeek-V2 Technical Report .

I am trying to understand equation 7 in the [DeepSeek-V2 tech report](https://arxiv.org/html/2405.04434v5). . Here are the confusions I am having. 1. qti, kti, and vti are row vectors ? shapes...

Adding Blackwell support for distributed GEMM.

Blackwell support for Distributed GEMM

inactive-90d

[BUG] add_stub" not implemented for 'Float8_e4m3fn, Float4_e2m1fn_x2

### Which component has the problem? CuTe DSL ### Bug Report **Describe the bug** Simple low precision arithmetic not working. torch ``` import torch def add_precision_sweep (): precisions = [torch.float16,...

bug

? - Needs Triage

CuTe DSL

[BUG] Still getting nvidia-cutlass 4.2.0.0

### Which component has the problem? CuTe DSL ### Bug Report **Describe the bug** pip install -e . still creating 4.2.0.0 However pip install -e . in python/CuTeDSL creates 4.3.0.dev0...

bug

? - Needs Triage

CuTe DSL

[BUG] cutlass.cute.nvgpu.common.OpError: OpError: expects arch to be one of ['sm_100a', 'sm_100f'], but got sm_121a

### Which component has the problem? CuTe DSL ### Bug Report **Describe the bug** with nvidia-cutlass and nvidia-cutlass-dsl 4.2.0.0 ``` python cutlass/examples/python/CuTeDSL/blackwell/tutorial_gemm/fp16_gemm_1.py nvidia_cutlass_dsl/python_packages/cutlass/cute/nvgpu/tcgen05/mma.py", line 153, in __post_init__ raise OpError( cutlass.cute.nvgpu.common.OpError:...

bug

? - Needs Triage

CuTe DSL

Is there any sgemm example ( e.g. fp32) ?

Looking for a sgemm example. Any one knows where to find one ?