zhengjia issues

Results 18 issues of


                                            zhengjia

[EfficientNetV2/Tensorflow2] oom during training

I'm using the training script from https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow2/Classification/ConvNets/efficientnet_v2/S/training/AMP/convergence_8xA100.sh on my A100-80G node, no changes of parameters I am getting lot of errors about ```yml 7: [1,5]: File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/execute.py", line 59, in...

bug

any suggestions on the nerf result from kitti scene (single rgb camera)

hi, experts, thanks for instant-ngp(ingp). looks ingp has the great performance in both training hrs, and gpu memory. i am working on reconstruct kitti scene with ingp, but the results...

when pip install unstructured , why all versions of related modules are installed

**Describe the bug** ![recycle_python_modules](https://github.com/Unstructured-IO/unstructured/assets/3397714/b8e42572-9d7c-48ca-8bdb-e55575befd33) **To Reproduce** `pip install unstructured` **Expected behavior** only specific version should installed, not all versions

bug

gptSessionBenchmark failed due to invalid OptProfilerSelector shape

### System Info GPU: H20 server CUDA Version: 12.5 Driver: 555.42.02 TRTLLM Commit: 2d234357c6e69fa514f6e9b4d4a5ad3bc431c4a6 built from source on linux ### Who can help? _No response_ ### Information - [X] The...

bug

Investigating

functionality issue

better replace submodules update link to ssh

[.gitmodules](https://github.com/NVIDIA/TensorRT-LLM/blob/main/.gitmodules) always give broken git update in normal download env, may better replace url with ssh link thanks

question

[BUG]: when enable arg.print() got print errors

### Problem Description to debug `02_gemm_add_add_fastgelu` with client api, I tried to enable arg.Print() under Invoker:;Run() as following: ```c++ // Invoker struct Invoker : public BaseInvoker { using Argument =...

Under Investigation

debug build got error: R_X86_64_REX_GOTPCRELX | R_X86_64_PC32 out of range

### Problem Description during Debug build, facing R_X86_64_REX_GOTPCRELX( R_X86_64_PC32) out of range errors as following: ```yml # issue1 [ 83%] Built target test_convnd_bwd_data ld.lld: error: ../../library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/CMakeFiles/device_grouped_conv3d_fwd_instance.dir/xdl/mem/_ZN2ck16tensor_operation6device47DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3ILi3ENS_13tensor_layout11convolution6NDHWGCENS4_6GKZYXCENS_5TupleIJEEENS4_6NDHWGKEffffS8_fNS0_12element_wise11PassThroughESB_SB_LNS1_32ConvolutionForwardSpecializationE0ELNS1_18GemmSpecializationE7ELi64ELi16ELi16ELi128ELi8ELi8ELi16ELi16ELi1ELi1ENS_8SequenceIJLi16ELi4ELi1EEEENSE_IJLi1ELi0ELi2EEEESG_Li2ELi4ELi4ELi0ESF_SG_SG_Li2ELi4ELi4ELi0ELi1ELi1ENSE_IJLi1ELi16ELi1ELi4EEEELi4ELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE0EffE7Invoker3RunEPKNS1_12BaseArgumentERK12StreamConfig+0x10): relocation R_X86_64_REX_GOTPCRELX out of...

Under Investigation

[Feature Request]: atomicAdd() to support half2

### Suggestion Description hi, hip team, here is cuda version, ```c++ void atomic_add_gmem_h2(half2* addr, half2 in) { atomicAdd(addr, in); } ``` looks there's non hip alternative yet, if built with...

Under Investigation

feature request

[Question] : calculate output error in backward should be partial of activation function rather than activation function itself ?

hi, team, in `fully_fused_mlp.cu` , the following looks not understandable: ```c++ // If the output width is larger than 16 dims, we use cutlass to backpropagate through the last layer...

[bug] in write_image_imageio(), doesn't handle grayscale image write correctly

```sh python3 samples/mlp_learning_an_image_pytorch.py # with default albert.jpg & config.json ``` give errors: ```yml ValueError: Can't write images with one color channel. ``` looks the write_image_imageio() under common.py missing handle grayscale...