Douglas Lehr issues

Results 6 issues of


                                            Douglas Lehr

hip TBE Forward Kernel

Table Batched Embedding Kernel for AMD GPUs. This forward kernel takes advantage of GCN intrinsics to enable data pipelining for embedding bag indices loads while performing accumulations and stores.

cla signed

Add CUDAMallocManagedAllocator Backend With the new CUDAAllocator class, we have created a new CUDAMallocManagedAllocator, which will handle allocator requests from both cpu and cuda device types when the backend is...

libcustom_ops.so and libcusom_backend.so not built for ROCm

### 🐛 Describe the bug The following unit tests fail due to a shared objects not being built for custom operators, and custom backends. test_pruning_op test_calling_custom_op (__main__.TestCustomOperators) test_pruning_op test_calling_custom_op_inside_script_module (__main__.TestCustomOperators)...

Unit Test Parity

Debug why unary polygamma tests are skipped in upstream CI but pass when run.

### 🐛 Describe the bug Need to find out why these passing tests are skipped in upstream CI for `test_unary_ufuncs.py` test_unary_ufuncs test_reference_numerics_hard_polygamma_polygamma_n_2_cuda_float16 (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs test_reference_numerics_hard_polygamma_polygamma_n_2_cuda_float17 (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs test_reference_numerics_hard_polygamma_polygamma_n_2_cuda_float18 (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs...

Unit Test Parity

PyTorch CudaGraph UT support on ROCm

### 🐛 Describe the bug Currently we do not have hipGraph support in PyTorch. We are working to add that implementation. In the meantime, the cudaGraph unit tests test_cuda test_graph_capture_simple...

Unit Test Parity

RPD integration into SGLANG with --profile-req support

Add support for serving profiling by request count instead of capturing entire serving run.

Douglas Lehr

hip TBE Forward Kernel

Uvm backend

libcustom_ops.so and libcusom_backend.so not built for ROCm

Debug why unary polygamma tests are skipped in upstream CI but pass when run.

PyTorch CudaGraph UT support on ROCm

RPD integration into SGLANG with --profile-req support