chengyi comments

Repositories
Issues
Comments

Results 2 comments of


                                            chengyi

Question about FP8 Tensor Core Mantissa Precision

@ginowu i have also question，i see nvidia doc (https://docs.nvidia.com/cuda/parallel-thread-execution/#asynchronous-warpgroup-level-matrix-data-types) , there`wgmma` if use fp8 input, can only use `fp16/fp32` as accumulator. Or do you mean precision limitations not ralated to...

[BUG]: Unknown CMake command "python_add_library" in pybind11NewTools.cmake

> I come across with the same issue, when pybind is built with NVIDIA TensorRT-LLM > > -- Found Torch: /home/ma-user/anaconda3/envs/py10-llm/lib/python3.10/site-packages/torch/lib/libtorch.so -- TORCH_CXX_FLAGS: -D_GLIBCXX_USE_CXX11_ABI=0 -- Building for TensorRT version: 8.6.1,...