array-api-compat [WIP] Add paddle support

Support Paddle framework in array-api-compat, but still working in progress.

TODO List:

[x] https://github.com/PaddlePaddle/Paddle/pull/69632
[x] https://github.com/PaddlePaddle/Paddle/pull/69477

Nov 26 '24 04:11 HydrogenSulfate

Cool, thanks for working on this @HydrogenSulfate!

I am curious to learn a bit more about Paddle. In particular conceptually what is supported - https://www.paddlepaddle.org.cn/documentation/docs/en/guides/jit/index_en.html and a few other guides tell me a bit, but not quite what I was most interested in. A few questions if you don't mind:

Is the default execution model eager or lazy/graph-based?
It looks like there is a JIT compiler, what's the syntax and does it work similar to, for example, jax.jit or torch.compile?
Is item and slice assignment supported via __setitem__? And indexing with boolean mask?
Is mixed integer and floating-point type promotion supported?
I see it has CPU and NVIDIA GPU support, plus some other vendors of accelerators that I don't immediately recognize. Are those all GPUs as well? And ROCm and Intel XPUs are not supported (now or in the near future)?

Nov 26 '24 10:11 rgommers

Cool, thanks for working on this @HydrogenSulfate!

I am curious to learn a bit more about Paddle. In particular conceptually what is supported - https://www.paddlepaddle.org.cn/documentation/docs/en/guides/jit/index_en.html and a few other guides tell me a bit, but not quite what I was most interested in. A few questions if you don't mind:

Is the default execution model eager or lazy/graph-based?

It looks like there is a JIT compiler, what's the syntax and does it work similar to, for example, jax.jit or torch.compile?

Is item and slice assignment supported via __setitem__? And indexing with boolean mask?

Is mixed integer and floating-point type promotion supported?

I see it has CPU and NVIDIA GPU support, plus some other vendors of accelerators that I don't immediately recognize. Are those all GPUs as well? And ROCm and Intel XPUs are not supported (now or in the near future)?

Thanks much for reply and attention to this PR,

Paddle use eager execution mode as default(eager Tensor running with dynamic graph), and can be manually switched to static graph(lazy Tensor running with static computational Program) by model = paddle.jit.to_static(model).
The usage of paddle.jit.to_static and torch.compile/jiax.jit is very similar. When designing these interfaces, we referred to influential and great tools such as pytorch/jax. The usage of paddle.jit.* is roughly as follows:
1. Firstly, users will use dynamic graphs for programming and training models
2. Secondly, if users need it, they can use one line of code to convert the model: model = paddle.jit.to_static(model) without any other modifications, convert the model to a static graph model, and then start training. Due to the advantages of static graph models, this usually results in a small performance improvement, and our conversion rate has been extensively tested on our existing models, with a success rate close to 100%
3. If there is a higher performance requirement, users can add the option to enable the CINN compiler in jit.to_static: modulus-sym code for exmample, which can capture the entire computation graph, including forward pass, backward pass, even double-backward pass(or higher-order), and further accelerate the program. We have tested it on 40+models in the NVIDIA/modulus-sym suite and achieved IPS performance that exceeds PyTorch by about 70% when the CINN compiler is enabled (of course, this is partly because PyTorch does not seem to support capturing and compiling high-order backward)
4. After training, we can save the computational program of model via: paddle.jit.save(model, output_path) to get a deployable model(like .pb of tensorflow).

item and slice assignment are supported with broadcasting as below

import paddle

x = paddle.randn([4, 3, 2])
v = paddle.randn([3, 2])
x[0, 1] = 3.0
print(x)

x[:] = v
print(x)

mask = paddle.to_tensor([True, False, True, False])

x[mask] = paddle.zeros([3, 2])
print(x)

Our implicit promotion support fp32/fp64, c32/c64 promotion, but do not support mixed integer and bool type(the purpose is to avoid covert transformations that are easily overlooked by users, which can lead to the model giving unexpected results), detailed table can be checked at url:
We support XPU and ROCM, I will supplement these devices type in subsequent commits

Nov 26 '24 14:11 HydrogenSulfate

Thanks for the detailed explanation @HydrogenSulfate, much appreciated.

5. We support XPU and ROCM, I will supplement these devices type in subsequent commits

I'll note that I tried inferring supported devices from this Install page, where ROCm/XPU aren't yet present:

Nov 27 '24 07:11 rgommers

谢谢你的详细解释@HydrogenSulfate，非常感谢。

我们支持XPU和ROCM，我会在后续的提交中补充这些设备类型

我会注意到，我尝试从此安装页面推断支持的设备，但其中 ROCm/XPU 尚不存在：

Embarrassingly, our English documents are somewhat outdated, so you can use the browser's translation feature to translate the Chinese documents into English.

ROCm is used in HYGON:

XPU is used in KUNLUNXIN:

Nov 27 '24 07:11 HydrogenSulfate

Embarrassingly, our English documents are somewhat outdated

Not embarrassing at all - we still haven't even deployed our Chinese translations on https://numpy.org/ (they're coming though!).

Thanks for the tips. Once this is ready, I'll try giving Paddle + SciPy a spin.

Nov 27 '24 08:11 rgommers

I converted this PR to draft since it looks like it is still a work in progress @HydrogenSulfate , but feel free to let us know whenever it is ready for another look!

Mar 29 '25 22:03 lucascolley