[ADRENO][TEXTURE] Texture based lowering
Introduces the below features over texture annotation
- Lowering, codegen and runtime for texture.
- image2d_array_t support - Added depth dimension allows more allocations using texture instead of falling back to buffer when the texture limits exceeds.
- A comprehensive set of schedules for Adreno textures.
- Texture packing of arbitrary types up to 128 bit (FP16-NCHW8c, INT8-NCHW16c ...etc.).
- A clBufferDescriptor debug dump controlled by cmake options.
- Pipeline definition for adreno target.
While covering these features the below interfaces or passes or enhanced which need a review.
- alloc_tensor: VDevice information is passed across these API's. The way of texture allocation is
alloc_storageallocates buffer/image objects as requested followed by alloc_tensor being a view of any scope. This takes care of optimum utilization backing memory across different image objects or scopes. - Constants Saving: Handled by adding memory scope section in executable. This introduces a new header magic to retain the backward compatibility.
- Static Memory Planing: Mostly port from Relay static memory planner with mixed mode allocator.
Summary of Changes
Hello @srkreddy1238, I'm Gemini Code Assist[^1]! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces a significant overhaul to TVM's Adreno backend, primarily by enabling and optimizing texture-based lowering. The changes span across runtime, Relax, and TIR components to ensure that GPU texture memory can be effectively utilized for various operations like convolutions, pooling, and layout transformations. This aims to improve memory efficiency and performance on Adreno devices by providing dedicated schedules and memory management for texture objects, alongside robust mechanisms for propagating memory scope information throughout the compilation pipeline.
Highlights
- Texture Annotation and Lowering: Introduces comprehensive support for texture annotation, lowering, codegen, and runtime specifically for Adreno GPUs. This enables more efficient memory utilization by leveraging texture memory instead of falling back to buffers when limits are exceeded.
-
image2d_array_t Support: Adds support for
image2d_array_twhich includes a depth dimension, allowing for more flexible and larger texture allocations, particularly beneficial for NCHW layouts. -
Adreno Texture Schedules: A comprehensive set of DLight schedules for Adreno textures has been added, including specialized rules for
Conv2d,LayoutTransform,Pool2D, and aFallbackmechanism for general operations. - Texture Packing: Enables texture packing of arbitrary data types up to 128 bits, supporting formats like FP16-NCHW8c and INT8-NCHW16c, which are crucial for optimizing performance on Adreno GPUs.
-
Memory Scope Propagation: Enhances
runtime.TensorwithSetScopeandGetScopemethods, and updatesSaveDLTensor/Loadto preserve memory scope information. This ensures that memory allocation decisions, especially for textures, are correctly propagated through the Relax and TIR pipelines. - Static Memory Planning Integration: The static memory planner has been updated to account for texture memory scopes and sizes, porting concepts from Relay's static memory planner with a mixed-mode allocator to better manage device-specific memory.
-
New TIR Passes: Introduces
InjectTextureAllocandTextureFlattenTIR passes.InjectTextureAllocinserts texture allocation intrinsics, whileTextureFlattentransforms multi-dimensional buffer accesses into 2D (width, height, depth) texture accesses based on storage scope. -
OpenCL Codegen and Runtime Updates: Updates the OpenCL codegen to correctly handle
image2d_array_ttypes andtexture2d_load/storeintrinsics, usingint4for coordinates and managing channel sizes. The OpenCL runtime now supports allocatingimage2d_array_twith depth and calculates texture memory sizes based on device attributes.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in pull request comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with :thumbsup: and :thumbsdown: on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
[^1]: Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.
@tqchen managed with out altering runtime::Tensor
VM byte code now adds MemoryScope section.
To keep backward compatibility of Load, I added new header magic kTVMVMBytecodeMagicV2 . Let me know if this need to be handled via VM_VERSION.
Q not related directly to this PR, sorry. Can TVM and Adreno flow be compiled/executed on Snapdragon X1? Windows/Linux?
Q not related directly to this PR, sorry. Can TVM and Adreno flow be compiled/executed on Snapdragon X1? Windows/Linux?
Yes. It works on Snapdragon laptops (X Elite) with both arm64 (Arm64 need Qualcomm Arm64 OpenCL SDK) and x64 (Generic OpenCL works w/o CLML enabled) compilations.
Sample config
cp ../cmake/config.cmake .
Add-Content config.cmake "set(USE_OPENCL $ENV:OPENCL_SDK_ADRENO_X86)"
Add-Content config.cmake "set(USE_LLVM $ENV:LLVM_CONFIG)"
Add-Content config.cmake "set(USE_CLML $ENV:OPENCL_SDK_ADRENO_X86)"
Add-Content config.cmake "set(USE_RPC ON)"
Add-Content config.cmake "set(USE_CPP_RPC ON)"
Add-Content config.cmake "set(USE_KALLOC_ALIGNMENT 32)"
Add-Content config.cmake "set(USE_OPENCL_EXTN_QCOM ON)"
cmake .. -G "Visual Studio 17 2022" -A x64
cmake --build . --config Release --parallel $env:NUMBER_OF_PROCESSORS
cd build-arm64
cp ../cmake/config.cmake .
Add-Content config.cmake "set(USE_OPENCL $ENV:OPENCL_SDK_ADRENO_ARM64)"
Add-Content config.cmake "set(USE_CLML $ENV:OPENCL_SDK_ADRENO_ARM64)"
Add-Content config.cmake "set(USE_RPC ON)"
Add-Content config.cmake "set(USE_CPP_RPC ON)"
Add-Content config.cmake "set(USE_KALLOC_ALIGNMENT 32)"
Add-Content config.cmake "set(USE_OPENCL_EXTN_QCOM ON)"
cmake .. -G "Visual Studio 17 2022" -A ARM64
cmake --build . --config Release --parallel $env:NUMBER_OF_PROCESSORS --target tvm_rpc
mlc-llm too works.