npu_plugin Questions about SHAVE Data Backup Storage and Cache Coherency

In the 2025.18 release, I noticed that the Act Runtime now supports SHAVE execution directly from DDR, rather than copying data from DDR to CMX via DMA. I'm curious if there are any tests or benchmarks comparing the performance of these two approaches, and under what circumstances one method is preferred over the other.

Additionally, is the SLC in Lunar Lake used to back up the SHAVE L2 cache? How is cache coherence maintained between the LLC and the SHAVE L2 cache?

May 03 '25 14:05 Kepontry

Hello @Kepontry!

Thank you for reaching out!

I noticed that the Act Runtime now supports SHAVE execution directly from DDR, rather than copying data from DDR to CMX via DMA. I'm curious if there are any tests or benchmarks comparing the performance of these two approaches, and under what circumstances one method is preferred over the other.

I followed-up with the author of this feature. Will get back to you as soon as possible.

May 08 '25 13:05 DariaMityagina

Hi, Daria, any updates regarding this question? Any insights would be greatly appreciated!

May 16 '25 16:05 Kepontry

I noticed that the Act Runtime now supports SHAVE execution directly from DDR, rather than copying data from DDR to CMX via DMA. I'm curious if there are any tests or benchmarks comparing the performance of these two approaches, and under what circumstances one method is preferred over the other.

Thank you for your question and patience.

While the API contract indicating this change has been merged (as seen in the headers), the corresponding implementation of the API functions in the module responsible for executing tasks from DDR has not yet been released; these are handled in separate steps. Moreover, the current OpenVINO release does not support compiler-generated schedules for executing SHAVE tasks from DDR.

May 16 '25 16:05 DariaMityagina

Additionally, is the SLC in Lunar Lake used to back up the SHAVE L2 cache? How is cache coherence maintained between the LLC and the SHAVE L2 cache?

As for this part, L2C is currently caching DDR accesses from the shaves. Each concurrent user receives a Shave L2 partition for data and for instructions. At the start of an inference, all partitions are empty and are flushed upon completion.

Throughout the inference execution, the compiler manages cache coherence by inserting Cache Operations into the inference schedule.

The details can be found here: VPUIP/transforms/passes/add_sw_kernel_cache_handling_ops.cpp

May 19 '25 15:05 DariaMityagina

Hi Daria,

Thanks a lot for the detailed updates!

It sounds like choosing between DDR and CMX as backup storage for SHAVE will indeed be a trade-off dependent on specific workloads. A performance model to guide this decision during compilation would be very beneficial. I'm looking forward to the upcoming releases and the full support for these APIs. Are there any schedules?

Regarding the SHAVE L2 cache partitions for concurrent users: could you clarify how these partitions are allocated (e.g., statically or dynamically) and how they share the 256KB L2 cache, especially considering its size relative to multiple users?

May 19 '25 16:05 Kepontry