Zalman Stern issues

Results 16 issues of


                                            Zalman Stern

Fix autoscheduler failure on image loads with single point bounds for arguments. Add tests.

This is for Andrew to look at per previous discussion.

Change an assert to match code, hopefully.

Ran into this assert. The entire destructuring of For names things has to go, but the code only accesses two elements here and I'm not sure why a two level...

User context being optional is problematic for Python extensions.

Using Halide generated code inside Python requires user_context. For JIT, this is always present. For AOT, this would require specifying the user_context option throughout the entire pipeline used inside of...

Support for ARM SVE2.

Heavily based on Steve Suzuki's work here: https://github.com/halide/Halide/pull/6781 . Hopefully easier to merge with less effect on existing ARM support and fewer constraints on CodeGen_LLVM.

halide_get_cpu_features will need to be updated for x86 AVX10 and APX.

AVX10 and APX support is being added here: https://github.com/halide/Halide/pull/8052 . This will require runtime detection of these features. Per https://github.com/halide/Halide/pull/7840 and the comment added about how to do dfeature detection...

Tracking bug for SVE2 scalable vector todos.

LLVM crashes with scalable vectors that have minimum size of 1. Some cases that would use scalable vectors are using fixed vectors to avoid this. Architecture specific choice about whether...

Consider using llvm.vector.insert and llvm.vector.extract in slice_vector.

In the SVE2 branch, slice_vector uses newish `llvm.vector.insert` and `llvm.vector.extract` intrinsics for the scalable vector case. We should evaluate whether this is a better approach to use always.

Better discipline for interaction between Halide and LLVM types in CodeGen_LLVM.

This is a tracking bug recording a general thing encountred while adding SVE2 and braoder scalable vector support. Currently code generation for LLVM IR works by setting the `value` member...

Add support to CodeGen_LLVM for generating architecture specific assertions.

CodeGen_ARM, for SVE, generates a runtime check that the current processor supports the vector length compiled for. This is done in `begin_func` by checking if the current function does not...