whatdhack

Results 16 comments of whatdhack

A simple DQN example would be beneficial to someone getting started, given that DQN's are probably the very first DL RL one gets introduced to.

What is the best way to adapt the 8 checkpoints for A100-80GB/H100 for the 70B model to say 16 A100-40GB ?

@subramen , looks like there are more fundamental issues in adapting the 8 GPU checkpoint to any number higher than 8 . See the following. ` self.n_kv_heads = args.n_heads if...

Looks like it needs to be modified to get some metrics like the bf16TensorCoreGemm example.

After forcing cute to ignore the architecture checks [1](https://github.com/NVIDIA/cutlass/blob/e67e63c331d6e4b729047c95cf6b92c8454cba89/python/CuTeDSL/cutlass/cute/nvgpu/tcgen05/mma.py#L166) and [2](https://github.com/NVIDIA/cutlass/blob/e67e63c331d6e4b729047c95cf6b92c8454cba89/python/CuTeDSL/cutlass/cute/nvgpu/tcgen05/copy.py#L117) , hitting the following mlir issue . So looks like tcgen05 is not supported in DGX Spark. . Is...