[FEA]: Offset In load and store function
Is this a new feature, an improvement, or a change to existing functionality?
New Feature
How would you describe the priority of this feature request?
High
Please provide a clear description of problem this feature solves
I’m trying to implement some operators using cutile. Honestly, cutile is really cool for me – it’s very convenient and powerful. When I’m implementing a 3x3 MaxPool2D operator, I want each block to load a 3x3 tile so that it can compute the local maximum. However, these tiles overlap. In other words, the first tile and the second tile need to share part of the data. Currently, the ct.load function in cutile cannot handle this pattern. So I’d like to suggest adding a new parameter, for example:
ct.load(array, index, shape, offset) so that when loading data, it reads starting from array[index * shape + offset].
Feature Description
I want to get an offset parameter to control the load position. Assume we have a 2D input, and we perform 3×3 max pooling with stride 1. That means each 3×3 window overlaps the previous one by 2 columns (or rows). Example of a small input (7×7 for illustration):
Input: a00 a01 a02 a03 a04 a05 a06 a10 a11 a12 a13 a14 a15 a16 a20 a21 a22 a23 a24 a25 a26 a30 a31 a32 a33 a34 a35 a36 a40 a41 a42 a43 a44 a45 a46 a50 a51 a52 a53 a54 a55 a56 a60 a61 a62 a63 a64 a65 a66
Tile 0 (starting at (0,0)): a00 a01 a02 a10 a11 a12 a20 a21 a22
Tile 1 (starting at (0,1)): <-- overlaps with Tile 0 a01 a02 a03 a11 a12 a13 a21 a22 a23
Describe your ideal solution
add an offset parameter to ct.load(array, index, shape)
Describe any alternatives you have considered
No response
Additional context
No response
Contributing Guidelines
- [x] I agree to follow cuTile Python's contributing guidelines
- [x] I have searched the open feature requests and have found no duplicates for this feature request