zhengjia
zhengjia
I'm using the training script from https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow2/Classification/ConvNets/efficientnet_v2/S/training/AMP/convergence_8xA100.sh on my A100-80G node, no changes of parameters I am getting lot of errors about ```yml 7: [1,5]: File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/execute.py", line 59, in...
hi, experts, thanks for instant-ngp(ingp). looks ingp has the great performance in both training hrs, and gpu memory. i am working on reconstruct kitti scene with ingp, but the results...
**Describe the bug**  **To Reproduce** `pip install unstructured` **Expected behavior** only specific version should installed, not all versions
### System Info GPU: H20 server CUDA Version: 12.5 Driver: 555.42.02 TRTLLM Commit: 2d234357c6e69fa514f6e9b4d4a5ad3bc431c4a6 built from source on linux ### Who can help? _No response_ ### Information - [X] The...
[.gitmodules](https://github.com/NVIDIA/TensorRT-LLM/blob/main/.gitmodules) always give broken git update in normal download env, may better replace url with ssh link thanks
### Problem Description to debug `02_gemm_add_add_fastgelu` with client api, I tried to enable arg.Print() under Invoker:;Run() as following: ```c++ // Invoker struct Invoker : public BaseInvoker { using Argument =...
### Problem Description during Debug build, facing R_X86_64_REX_GOTPCRELX( R_X86_64_PC32) out of range errors as following: ```yml # issue1 [ 83%] Built target test_convnd_bwd_data ld.lld: error: ../../library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/CMakeFiles/device_grouped_conv3d_fwd_instance.dir/xdl/mem/_ZN2ck16tensor_operation6device47DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3ILi3ENS_13tensor_layout11convolution6NDHWGCENS4_6GKZYXCENS_5TupleIJEEENS4_6NDHWGKEffffS8_fNS0_12element_wise11PassThroughESB_SB_LNS1_32ConvolutionForwardSpecializationE0ELNS1_18GemmSpecializationE7ELi64ELi16ELi16ELi128ELi8ELi8ELi16ELi16ELi1ELi1ENS_8SequenceIJLi16ELi4ELi1EEEENSE_IJLi1ELi0ELi2EEEESG_Li2ELi4ELi4ELi0ESF_SG_SG_Li2ELi4ELi4ELi0ELi1ELi1ENSE_IJLi1ELi16ELi1ELi4EEEELi4ELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE0EffE7Invoker3RunEPKNS1_12BaseArgumentERK12StreamConfig+0x10): relocation R_X86_64_REX_GOTPCRELX out of...
### Suggestion Description hi, hip team, here is cuda version, ```c++ void atomic_add_gmem_h2(half2* addr, half2 in) { atomicAdd(addr, in); } ``` looks there's non hip alternative yet, if built with...
hi, team, in `fully_fused_mlp.cu` , the following looks not understandable: ```c++ // If the output width is larger than 16 dims, we use cutlass to backpropagate through the last layer...
```sh python3 samples/mlp_learning_an_image_pytorch.py # with default albert.jpg & config.json ``` give errors: ```yml ValueError: Can't write images with one color channel. ``` looks the write_image_imageio() under common.py missing handle grayscale...