darrenearl comments

Results 6 comments of


                                            darrenearl

KeyError: 'CenterHead is not in the head registry'

> it seems you are using the Det3D repo instead of the current one > > "/media/exinova/ssd/CenterPoint-master_py/Det3D/det3d/models/detectors/point_pillars.py"" but when i uninstall det3d, error also happend: (centerpoint) exinova@exinova-B560M-AORUS-PRO-AX:/media/exinova/ssd/CenterPoint-master_py$ python tools/train.py configs/nusc/pp/nusc_centerpoint_pp_02voxel_two_pfn_10sweep.py...

关于qwen2-1.5b模型的问题

> 可以参考#17 assert self._attn_implementation == "sdpa"做量化开启smooth以及关闭rotation后会在 `assert self._attn_implementation == "sdpa"`这里报错，默认会走eager模式，请问对qwen的权重要求是什么呢？qwen2-1.5b默认是tf16的

关于qwen2-1.5b模型的问题

> 我试过transformers==4.38.2，默认走的是sdpa。qwen走eager模式+fp16/bf16会有问题，参考[https://github.com/huggingface/transformers/pull/33317。](https://github.com/huggingface/transformers/pull/33317%E3%80%82) 我用的transformers 4.38.2,同样跑7b模型量化也有同样的问题，命令如下： `python3 examples/quant_model.py \ --model_path Qwen2.5-7B-Instruct \ --tokenizer_path Qwen2.5-7B-Instruct \ --dtype bfloat16 \ --smooth true \ --rotation false \ --dataset wikitext2 \ --nsamples 128 \ --w_quantizer FixedQuantize...

Possibility of using different group size setting

我在marlin核函数增加了以下代码： ` CALL_IF(1, 8, 8, 16) CALL_IF(1, 16, 4, 16) CALL_IF(2, 16, 4, 16) CALL_IF(3, 16, 4, 16) CALL_IF(4, 16, 4, 16)` 然后再跑test_w4a8.py, 其中groupsize=256，但是还是报错 `FAIL: test_groups (__main__.Test) ---------------------------------------------------------------------- Traceback (most...

Possibility of using different group size setting

您好，我在用A6000, 3090, 4090测试w4a8的marlin与cublas float32进行对比的时候，发现marlin比cublas fp32慢4~6倍，但是我用A100显卡发现和marlin论文差不多的性能，请问这是什么原因呢？

Possibility of using different group size setting

> @darrenearl 我认为有可能是这些卡的HBM的带宽限制了性能，像4090只有A100的一半带宽。我建议你可以用nsight-compute去看一下kernel的瓶颈，这样更好分析。好的，感谢