dan_the_3rd issues

Results 14 issues of


                                            dan_the_3rd

repeat with flexible shape (EnumeratedShapes)

## 🐞 the repeat operation does not work with dynamic inputs - See code: "repeat" only seems to work with constant values for the reps - The same problem happens...

bug

triaged

pytorch

Flexable Shape

[BUG] Fused GEMM example gives wrong result with some shapes

**Describe the bug** Fused GEMM example gives the wrong result for some values of `problemSize1.K`. **Steps/Code to reproduce bug** Set the following problem sizes in `examples/13_two_tensor_op_fusion/fused_two_gemms_f16_sm80_shmem.cu` ```c++ cutlass::gemm::GemmCoord gemm_f16_sm80_problem_size_0(128*640, 48,...

bug

? - Needs Triage

TCP tunnels can freeze terminal

Hello, When transferring a lot of data over a TCP tunnel, the terminal freezes during the transfer (also happens to me for regular SSH tunnels), and sometimes for even longer...

Enabling HTTP caching of inference results

## Is your feature request related to a problem? Please describe. Currently, TorchServe adds headers that prevent from caching the inference results: https://github.com/pytorch/serve/blob/30f83500b0850e26ec55581f48a9307b1986f9f9/frontend/server/src/main/java/org/pytorch/serve/util/NettyUtils.java#L187-L190 This prevents some reverse-proxies like `nginx` from...

enhancement

help wanted

Add unit type Special_Map_Revealer

CLA Signed

MmaFromSmem[A100]: Accept transposed operand A

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #540 * #539

CLA Signed

Improve build time by ~30%

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #540 * __->__ #539 ... by reducing the number of ATen imports, and skipping them altogether when building the actual kernels 13mn ->...

CLA Signed

Update FlashAttention

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #495 cc @tridao

CLA Signed

MemEff: Accumulate in f32 for bw

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #467 **PERFORMANCE** This makes performance worse in f16 :( But I think we need it for stability bw P100/V100 (f32/f16) ``` [----------------------------------------...

CLA Signed

[CI] Ensure we don't break windows build

CLA Signed