tvm [ADRENO][TEXTURE] Texture based lowering

Introduces the below features over texture annotation

Lowering, codegen and runtime for texture.
image2d_array_t support - Added depth dimension allows more allocations using texture instead of falling back to buffer when the texture limits exceeds.
A comprehensive set of schedules for Adreno textures.
Texture packing of arbitrary types up to 128 bit (FP16-NCHW8c, INT8-NCHW16c ...etc.).
A clBufferDescriptor debug dump controlled by cmake options.
Pipeline definition for adreno target.

While covering these features the below interfaces or passes or enhanced which need a review.

alloc_tensor: VDevice information is passed across these API's. The way of texture allocation is alloc_storage allocates buffer/image objects as requested followed by alloc_tensor being a view of any scope. This takes care of optimum utilization backing memory across different image objects or scopes.
Constants Saving: Handled by adding memory scope section in executable. This introduces a new header magic to retain the backward compatibility.
Static Memory Planing: Mostly port from Relay static memory planner with mixed mode allocator.

Nov 28 '25 11:11 srkreddy1238

Summary of Changes

Hello @srkreddy1238, I'm Gemini Code Assist[^1]! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant overhaul to TVM's Adreno backend, primarily by enabling and optimizing texture-based lowering. The changes span across runtime, Relax, and TIR components to ensure that GPU texture memory can be effectively utilized for various operations like convolutions, pooling, and layout transformations. This aims to improve memory efficiency and performance on Adreno devices by providing dedicated schedules and memory management for texture objects, alongside robust mechanisms for propagating memory scope information throughout the compilation pipeline.

Highlights

Texture Annotation and Lowering: Introduces comprehensive support for texture annotation, lowering, codegen, and runtime specifically for Adreno GPUs. This enables more efficient memory utilization by leveraging texture memory instead of falling back to buffers when limits are exceeded.
image2d_array_t Support: Adds support for image2d_array_t which includes a depth dimension, allowing for more flexible and larger texture allocations, particularly beneficial for NCHW layouts.
Adreno Texture Schedules: A comprehensive set of DLight schedules for Adreno textures has been added, including specialized rules for Conv2d, LayoutTransform, Pool2D, and a Fallback mechanism for general operations.
Texture Packing: Enables texture packing of arbitrary data types up to 128 bits, supporting formats like FP16-NCHW8c and INT8-NCHW16c, which are crucial for optimizing performance on Adreno GPUs.
Memory Scope Propagation: Enhances runtime.Tensor with SetScope and GetScope methods, and updates SaveDLTensor/Load to preserve memory scope information. This ensures that memory allocation decisions, especially for textures, are correctly propagated through the Relax and TIR pipelines.
Static Memory Planning Integration: The static memory planner has been updated to account for texture memory scopes and sizes, porting concepts from Relay's static memory planner with a mixed-mode allocator to better manage device-specific memory.
New TIR Passes: Introduces InjectTextureAlloc and TextureFlatten TIR passes. InjectTextureAlloc inserts texture allocation intrinsics, while TextureFlatten transforms multi-dimensional buffer accesses into 2D (width, height, depth) texture accesses based on storage scope.
OpenCL Codegen and Runtime Updates: Updates the OpenCL codegen to correctly handle image2d_array_t types and texture2d_load/store intrinsics, using int4 for coordinates and managing channel sizes. The OpenCL runtime now supports allocating image2d_array_t with depth and calculates texture memory sizes based on device attributes.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with :thumbsup: and :thumbsdown: on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

[^1]: Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Nov 28 '25 11:11 gemini-code-assist[bot]

@tqchen managed with out altering runtime::Tensor

VM byte code now adds MemoryScope section.

To keep backward compatibility of Load, I added new header magic kTVMVMBytecodeMagicV2 . Let me know if this need to be handled via VM_VERSION.

Dec 05 '25 17:12 srkreddy1238

Q not related directly to this PR, sorry. Can TVM and Adreno flow be compiled/executed on Snapdragon X1? Windows/Linux?

Dec 10 '25 07:12 elvin-n

Q not related directly to this PR, sorry. Can TVM and Adreno flow be compiled/executed on Snapdragon X1? Windows/Linux?

Yes. It works on Snapdragon laptops (X Elite) with both arm64 (Arm64 need Qualcomm Arm64 OpenCL SDK) and x64 (Generic OpenCL works w/o CLML enabled) compilations.

Sample config

           cp ../cmake/config.cmake .
           Add-Content config.cmake "set(USE_OPENCL $ENV:OPENCL_SDK_ADRENO_X86)"
           Add-Content config.cmake "set(USE_LLVM $ENV:LLVM_CONFIG)"
           Add-Content config.cmake "set(USE_CLML $ENV:OPENCL_SDK_ADRENO_X86)"
           Add-Content config.cmake "set(USE_RPC ON)"
           Add-Content config.cmake "set(USE_CPP_RPC ON)"
           Add-Content config.cmake "set(USE_KALLOC_ALIGNMENT 32)"
           Add-Content config.cmake "set(USE_OPENCL_EXTN_QCOM ON)"
           cmake .. -G "Visual Studio 17 2022" -A x64
           cmake --build . --config Release --parallel $env:NUMBER_OF_PROCESSORS

           cd build-arm64
           cp ../cmake/config.cmake .
           Add-Content config.cmake "set(USE_OPENCL $ENV:OPENCL_SDK_ADRENO_ARM64)"
           Add-Content config.cmake "set(USE_CLML $ENV:OPENCL_SDK_ADRENO_ARM64)"
           Add-Content config.cmake "set(USE_RPC ON)"
           Add-Content config.cmake "set(USE_CPP_RPC ON)"
           Add-Content config.cmake "set(USE_KALLOC_ALIGNMENT 32)"
           Add-Content config.cmake "set(USE_OPENCL_EXTN_QCOM ON)"
           cmake .. -G "Visual Studio 17 2022" -A ARM64
           cmake --build . --config Release --parallel $env:NUMBER_OF_PROCESSORS --target tvm_rpc

mlc-llm too works.

Dec 11 '25 07:12 srkreddy1238