Po Yen Chen issues

Results 13 issues of


                                            Po Yen Chen

Refactor ck_tile fMHA forward example

Check window lengths in tile_window_with_static_distribution<> ctor

Generate add_device_xxxxx_instances() declarations by CMake or script

While adding new type of device operator instances, we also have to add corresponding `add_device_xxxx_instances()` declarations in the header. It's error-prone and time consuming. ```c++ // file: library/include/ck/library/tensor_operation_instance/gpu/gemm.hpp namespace ck...

Let ck::Array<> derive from std::array<>

The `ck::Array` and `std::array` behave same. And the only difference between those two types is that former has templated assignment operator. I think `ck::Array` can be used in most use...

Extract common code from example/test/profiler

There are lots of duplicated codes in implementations, like the `HostTensorDescriptor` creation logic. ```c++ auto f_host_tensor_descriptor1d = [](std::size_t len, std::size_t stride) { return HostTensorDescriptor({len}, {stride}); }; auto f_host_tensor_descriptor2d = [](std::size_t...

Place ckProfiler headers into include/ck/profiler

Currently we put headers into _include/**ck/xxxxx**_ sub-directories except _ckProfiler_ ```console $ tree library/include/ -L 3 library/include/ └── ck └── library ├── reference_tensor_operation ├── tensor_operation_instance └── utility $ tree profiler/include/ -L...

Enumerate source files automatically (as add_library()/add_executable() argument)

For the targets like _ckProfiler_, I found that existing source files and the `add_executable()` arguments are identical. We can see same symptom in the instance libraries: - First argument of...

[CK_TILE] Add appendkv kernel to support mha with kvcache

Add new `fmha_fwd_appendkv()` API which runs ahead the `fmha_fwd()`/`fmha_fwd_splitkv()` API. The `fmha_fwd_appendkv()` + `fmha_fwd()`/`fmha_fwd_splitkv()` combination implement the functionality of `mha_fwd_kvcache()` in FA 2.5 (without paged-kvcache part)

[CK_TILE] Pick bugfixes for ROCm 6.2 compiler issues

[CK_TILE] Change output accum tensor layout of fmha fwd split-kv & combine kernels

Use same tensor layout for `o_acc` & `o`