[WIP] Add paddle support
Support Paddle framework in array-api-compat, but still working in progress.
TODO List:
- [x] https://github.com/PaddlePaddle/Paddle/pull/69632
- [x] https://github.com/PaddlePaddle/Paddle/pull/69477
Related issue: https://github.com/PaddlePaddle/Paddle/issues/68618
Cool, thanks for working on this @HydrogenSulfate!
I am curious to learn a bit more about Paddle. In particular conceptually what is supported - https://www.paddlepaddle.org.cn/documentation/docs/en/guides/jit/index_en.html and a few other guides tell me a bit, but not quite what I was most interested in. A few questions if you don't mind:
- Is the default execution model eager or lazy/graph-based?
- It looks like there is a JIT compiler, what's the syntax and does it work similar to, for example,
jax.jitortorch.compile? - Is item and slice assignment supported via
__setitem__? And indexing with boolean mask? - Is mixed integer and floating-point type promotion supported?
- I see it has CPU and NVIDIA GPU support, plus some other vendors of accelerators that I don't immediately recognize. Are those all GPUs as well? And ROCm and Intel XPUs are not supported (now or in the near future)?
Cool, thanks for working on this @HydrogenSulfate!
I am curious to learn a bit more about Paddle. In particular conceptually what is supported - https://www.paddlepaddle.org.cn/documentation/docs/en/guides/jit/index_en.html and a few other guides tell me a bit, but not quite what I was most interested in. A few questions if you don't mind:
- Is the default execution model eager or lazy/graph-based?
- It looks like there is a JIT compiler, what's the syntax and does it work similar to, for example,
jax.jitortorch.compile?- Is item and slice assignment supported via
__setitem__? And indexing with boolean mask?- Is mixed integer and floating-point type promotion supported?
- I see it has CPU and NVIDIA GPU support, plus some other vendors of accelerators that I don't immediately recognize. Are those all GPUs as well? And ROCm and Intel XPUs are not supported (now or in the near future)?
Thanks much for reply and attention to this PR,
-
Paddle use eager execution mode as default(eager Tensor running with dynamic graph), and can be manually switched to static graph(lazy Tensor running with static computational Program) by
model = paddle.jit.to_static(model). -
The usage of
paddle.jit.to_staticandtorch.compile/jiax.jitis very similar. When designing these interfaces, we referred to influential and great tools such as pytorch/jax. The usage ofpaddle.jit.*is roughly as follows:- Firstly, users will use dynamic graphs for programming and training models
- Secondly, if users need it, they can use one line of code to convert the model:
model = paddle.jit.to_static(model)without any other modifications, convert the model to a static graph model, and then start training. Due to the advantages of static graph models, this usually results in a small performance improvement, and our conversion rate has been extensively tested on our existing models, with a success rate close to 100% - If there is a higher performance requirement, users can add the option to enable the CINN compiler in
jit.to_static: modulus-sym code for exmample, which can capture the entire computation graph, including forward pass, backward pass, even double-backward pass(or higher-order), and further accelerate the program. We have tested it on 40+models in the NVIDIA/modulus-sym suite and achieved IPS performance that exceeds PyTorch by about 70% when the CINN compiler is enabled (of course, this is partly because PyTorch does not seem to support capturing and compiling high-order backward) - After training, we can save the computational program of model via:
paddle.jit.save(model, output_path)to get a deployable model(like.pbof tensorflow).
-
item and slice assignment are supported with broadcasting as below
import paddle x = paddle.randn([4, 3, 2]) v = paddle.randn([3, 2]) x[0, 1] = 3.0 print(x) x[:] = v print(x) mask = paddle.to_tensor([True, False, True, False]) x[mask] = paddle.zeros([3, 2]) print(x) -
Our implicit promotion support fp32/fp64, c32/c64 promotion, but do not support mixed integer and bool type(the purpose is to avoid covert transformations that are easily overlooked by users, which can lead to the model giving unexpected results), detailed table can be checked at url:
-
We support XPU and ROCM, I will supplement these devices type in subsequent commits
Thanks for the detailed explanation @HydrogenSulfate, much appreciated.
5. We support XPU and ROCM, I will supplement these devices type in subsequent commits
I'll note that I tried inferring supported devices from this Install page, where ROCm/XPU aren't yet present:
谢谢你的详细解释@HydrogenSulfate,非常感谢。
我会注意到,我尝试从此安装页面推断支持的设备,但其中 ROCm/XPU 尚不存在:
![]()
Embarrassingly, our English documents are somewhat outdated, so you can use the browser's translation feature to translate the Chinese documents into English.
ROCm is used in HYGON:
XPU is used in KUNLUNXIN:
Embarrassingly, our English documents are somewhat outdated
Not embarrassing at all - we still haven't even deployed our Chinese translations on https://numpy.org/ (they're coming though!).
Thanks for the tips. Once this is ready, I'll try giving Paddle + SciPy a spin.
I converted this PR to draft since it looks like it is still a work in progress @HydrogenSulfate , but feel free to let us know whenever it is ready for another look!