Jiajie Li

Results 6 issues of Jiajie Li

### Profiler: Count Memory Access & Arith Ops from MLIR code **Add feature:** Read the `.mlir` code and count memory access and arithmetic ops by analyzing the loop nest. **How...

I just found that HeteroCL's Integer division truncation is toward zero (-3 / 4 = 0), while the Halide behavior is truncation toward the lower integer (-3 / 4 =...

enhancement

I follow the instructions [here](https://docs.exaloop.io/codon/interoperability/decorator) to try to use `codon` in Python codebase. I already installed `codon` in my installation dir `MY_CODON_DIR`, and `MY_CODON_DIR/bin/codon` is the executable. I can also...

The example is running on the NCCL backend for distributed GPU settings. I'm wondering if it can profile correctly on a multi-node (multiple CPU servers) distributed CPU settings with Gloo...

enhancement
plugin

# What does this PR do ? - Collect CUDA memory snapshot based on the previous commit (#9096 ), and further analyze which parts of the model contribute to the...

core
Run CICD
audio

**Describe the bug** **1st bug**: 4-digit `CUTLASS_LIBRARY_INSTANTIATION_LEVEL` is not used. [Here](https://github.com/NVIDIA/cutlass/blob/main/media/docs/profiler.md#instantiating-more-kernels-with-hopper) it said that CUTLASS 3.6 profiler can use an additional flag, `CUTLASS_LIBRARY_INSTANTIATION_LEVEL` , to instantiate all possible combinations. It...

bug
? - Needs Triage
inactive-30d
inactive-90d