Mooncake icon indicating copy to clipboard operation
Mooncake copied to clipboard

[Usage]: Does Mooncake support npu->gpu transport currently ?

Open artetaout opened this issue 5 months ago • 5 comments

Describe your usage question

Mooncake v0.3.7.post2 Refer to https://kvcache-ai.github.io/Mooncake/getting_started/quick-start.html#transfer-engine-quick-start Changes the client_buffer to tensor, and server_buffer to gpu-tensor. Finally got coredump... So do we support npu->gpu currently ?

Before submitting a new issue...

  • [x] Make sure you already searched for relevant issues and read the documentation

artetaout avatar Nov 04 '25 09:11 artetaout

The current Mooncake typically doesn't support heterogeneous transports (e.g. sender with Ascend NPUs, receiver with NV GPUs).

alogfans avatar Nov 05 '25 01:11 alogfans

Try this: https://github.com/kvcache-ai/Mooncake/blob/main/doc/zh/heterogeneous_ascend.md

ShangmingCai avatar Nov 05 '25 07:11 ShangmingCai

@ShangmingCai could you help figure out why "--device_name=mlx5_1" is needed in cmd "./transfer_engine_heterogeneous_ascend_perf_initiator --mode=initiator ... --device_name=mlx5_1", is there mlx NIC on 910B server?

Vikram111-pix avatar Nov 06 '25 09:11 Vikram111-pix

@ShangmingCai could you help figure out why "--device_name=mlx5_1" is needed in cmd "./transfer_engine_heterogeneous_ascend_perf_initiator --mode=initiator ... --device_name=mlx5_1", is there mlx NIC on 910B server?

@Vikram111-pix I assume it could be the config of H20 side.

@zuochunwei Can you help explain?

ShangmingCai avatar Nov 06 '25 09:11 ShangmingCai

@ShangmingCai could you help figure out why "--device_name=mlx5_1" is needed in cmd "./transfer_engine_heterogeneous_ascend_perf_initiator --mode=initiator ... --device_name=mlx5_1", is there mlx NIC on 910B server?

@Vikram111-pix I assume it could be the config of H20 side.

@zuochunwei Can you help explain? @ShangmingCai thanks for your reply, in the link you pasted the target and initiator both need argument "--device_name=mlx5_1", and from the code of heterogeneous_ascend, it firstly need copy data from HBM to DRAM, then use the rdma transport to write data to remote target, so the mlx5_1 is passed to rdma transport, but i am not sure if is there any mellanox nic on 910B server。

@zuochunwei could you pls help clarify this? thanks

Vikram111-pix avatar Nov 09 '25 12:11 Vikram111-pix