sof icon indicating copy to clipboard operation
sof copied to clipboard

ipc: rework Zephyr IPC interfaces

Open dcpleung opened this issue 6 months ago • 25 comments

This reworks Zephyr IPC interface to utilize the newly introduced IPC message service, which provides a more generic IPC interface.

dcpleung avatar Jul 02 '25 20:07 dcpleung

Hmm, on MTL/ace15 test results look good, but on LNL/PTL, DSP panics on first IPC sent. Not sure immediately what can explain this, the code should be the same for all these platforms (aside some missing thing in DT definitions). The app/prj.conf is shared for all.

[  282.845982] kernel: snd_sof:snd_sof_run_firmware: sof-audio-pci-intel-lnl 0000:00:1f.3: booting DSP firmware
[  282.845988] kernel: snd_sof_intel_hda_common:hda_dsp_cl_boot_firmware: sof-audio-pci-intel-lnl 0000:00:1f.3: IMR restore supported, booting from IMR directly
[  282.847178] kernel: snd_sof_pci_intel_mtl:mtl_dsp_core_power_up: sof-audio-pci-intel-lnl 0000:00:1f.3: FW Poll Status: reg[0x178d04]=0x2000101 successful
[  282.847182] kernel: snd_sof_pci_intel_mtl:mtl_dsp_cl_init: sof-audio-pci-intel-lnl 0000:00:1f.3: Primary core power up successful
[  282.847186] kernel: snd_sof_pci_intel_mtl:mtl_dsp_cl_init: sof-audio-pci-intel-lnl 0000:00:1f.3: FW Poll Status: reg[0x73214]=0x80000000 successful
[  282.847206] kernel: snd_sof_pci_intel_mtl:mtl_enable_interrupts: sof-audio-pci-intel-lnl 0000:00:1f.3: FW Poll Status: reg[0x1800]=0x41 successful
[  282.847222] kernel: snd_sof_pci_intel_mtl:mtl_enable_interrupts: sof-audio-pci-intel-lnl 0000:00:1f.3: FW Poll Status: reg[0x1140]=0x1 successful
[  282.847239] kernel: snd_sof_pci_intel_mtl:mtl_dsp_cl_init: sof-audio-pci-intel-lnl 0000:00:1f.3: FW Poll Status: reg[0x160200]=0x50000005 successful
[  282.960273] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc rx      : 0x1b080000|0x0: GLB_NOTIFICATION|FW_READY
[  282.960286] kernel: snd_sof:sof_set_fw_state: sof-audio-pci-intel-lnl 0000:00:1f.3: fw_state change: 3 -> 6
[  282.960295] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc rx done : 0x1b080000|0x0: GLB_NOTIFICATION|FW_READY
[  282.960569] kernel: snd_sof:snd_sof_run_firmware: sof-audio-pci-intel-lnl 0000:00:1f.3: firmware boot complete
[  282.960588] kernel: snd_sof:sof_set_fw_state: sof-audio-pci-intel-lnl 0000:00:1f.3: fw_state change: 6 -> 7
[  282.960616] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx      : 0x44000000|0x31400008: MOD_LARGE_CONFIG_SET [data size: 8]
[  282.960796] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc rx      : 0x1b0a0000|0x0: GLB_NOTIFICATION|EXCEPTION_CAUGHT
[  282.960806] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: ------------[ DSP dump start ]------------

@kv2019i do you know if baseFW is the destination for this large module set ? On teh positive side the exception handler is managing to send the exception IPC in part..

lgirdwood avatar Jul 03 '25 13:07 lgirdwood

From my limited debugging, it seems like on PTL, the busy bit is set after sending the first message, which prevents any more messages to be sent.

dcpleung avatar Jul 03 '25 20:07 dcpleung

the respective Zephyr PR https://github.com/zephyrproject-rtos/zephyr/pull/91606

lyakh avatar Jul 04 '25 08:07 lyakh

the respective Zephyr PR zephyrproject-rtos/zephyr#91606

Adding @ranj063 as we may have more users.

lgirdwood avatar Jul 04 '25 09:07 lgirdwood

I tested it a bit more. It seems like the exception comes when requesting D0->D3 transition more than one time. The test showed that the first D0->D3->D0 worked fine and FW was ready again. However, the second D0->D3 request generated the exception IPC.

dcpleung avatar Jul 14 '25 20:07 dcpleung

The issue with D0->D3->D0 is fixed so suspend/resume should be working on LNL and PTL.

dcpleung avatar Jul 16 '25 22:07 dcpleung

likely unrelated to this PR: a pause-resume failure on PTL similar to the one observed in other PRs. https://sof-ci.01.org/sofpr/PR10089/build14052/devicetest/index.html?model=PTLH_RVP_NOCODEC&testcase=multiple-pause-resume-50 . I'll retrigger tests to see if we can reproduce it

lyakh avatar Jul 18 '25 06:07 lyakh

SOFCI TEST

lyakh avatar Jul 18 '25 06:07 lyakh

Adding @kv2019i

lgirdwood avatar Sep 17 '25 14:09 lgirdwood

Zephyr side PR has been merged.

dcpleung avatar Sep 30 '25 20:09 dcpleung

FYI @dcpleung . Zephyr version in SOF was updated in https://github.com/thesofproject/sof/pull/10268 , but this does not pull in https://github.com/zephyrproject-rtos/zephyr/pull/96233 yet. There seems to be regressions with newer Zephyr, so the update was delayed (see discussion in #10268 ).

kv2019i avatar Oct 17 '25 08:10 kv2019i

Rebasing this trigger CI.

dcpleung avatar Oct 31 '25 17:10 dcpleung

@dcpleung Please mark as "ready for review" to get more review attention: I think this is ready to go, I'm already using this as a basis for my #10346

kv2019i avatar Nov 03 '25 10:11 kv2019i

@tmleman wrote:

Why do we need to do this? What drives this strong need to implement the IPC service in this form?

This really is to complete the Zephyr transition. I.e. https://github.com/thesofproject/sof/issues/5794 . The current code has two big problems:

  1. Intel SOF/Zephyr builds use an Intel specific IPC driver interface in Zephyr. This is not good, we should be using a IPC driver interface that can be implemened by multiple drivers for different hardware backends (just like we do for DMA, DAI, timers, etc). This is what this PR does.
  2. The original IPC Zephyr driver for Intel ADSP did not cover the full IPC interface (unlike we had in IPC3 in SOF). To really clean up (1), we need to expand the interface to cover the full IPC send/receive. I'm doing this on top of this PR in https://github.com/thesofproject/sof/pull/10346

kv2019i avatar Nov 05 '25 11:11 kv2019i

Ok, I see how using a more generic interface is beneficial and simply the right approach. However, this interface doesn't fit our hardware and checking a checkbox is a weak justification.

The original IPC Zephyr driver for Intel ADSP did not cover the full IPC interface

What exactly do you mean?

tmleman avatar Nov 05 '25 12:11 tmleman

@tmleman wrote:

Ok, I see how using a more generic interface is beneficial and simply the right approach. However, this interface doesn't fit our hardware and checking a checkbox is a weak justification.

To me this is fixing technical debt we have added. Prior to the Zephyr transaction, SOF had a IPC interface where common and platform logic was separated. With the Zephyr transition, we broke this and ended up having platform specific code end up to common SOF code (especially ipc-zephyr.c). This is now fixing that technical debt.

To me, we are not losing anything in when we use this interface with e.g. Intel DSP hardware.

The original IPC Zephyr driver for Intel ADSP did not cover the full IPC interface What exactly do you mean?

In short, in around v2.2, ipc_platform_send_msg() sent the whole message (header + payload) and platform drivers implemented this in different ways (on different hardware). The intel_adsp_ipc.h interface that was added to Zephyr (and ipc-zephyr.c to SOF), did not expose the functionality of this interface. The intel_adsp_ipc.h only covers the IPC header (pri+ext words) and the users of intel_adsp_ipc.h need to send/receive the IPC payload via some other interface. With this PR (and my #10346 ), I want to return us to the v2.2 level of semantics, so we have a clear split of IPC common and hw specific code. And as we now have drivers in Zephyr, the hw specific part should be in Zephyr drivers (and the IPC service framework backends provide a framework we can use for this without inventing something new).

kv2019i avatar Nov 05 '25 13:11 kv2019i

@kv2019i wrote:

ipc_platform_send_msg() sent the whole message (header + payload) and platform drivers implemented this in different ways (on different hardware).

It seems to me that you are forcibly connecting two different concepts. IPC as a protocol and IPC as a HW feature. If we are not planning to migrate the entire SOF code to Zephyr, then leaving the protocol logic on the SOF side is not necessarily an error. Our specific Intel IPC protocol utilizes two separate HW features: IPC and memory window. Using the memory window is optional. The IPC driver we had in Zephyr almost entirely covered the functionality of this IP. Nothing I read here convinces me that this is how it should look.

tmleman avatar Nov 05 '25 14:11 tmleman

@tmleman wrote:

ipc_platform_send_msg() sent the whole message (header + payload) and platform drivers implemented this in different ways (on different hardware).

It seems to me that you are forcibly connecting two different concepts. IPC as a protocol and IPC as a HW feature. If we are not planning to migrate the entire SOF code to Zephyr, then leaving the protocol logic on the SOF side is not necessarily an error. Our specific Intel IPC protocol utilizes two separate HW features: IPC and memory window. Using the memory window is optional. The IPC driver we had in Zephyr almost entirely covered the functionality of this IP. Nothing I read here convinces me that this is how it should look.

It's a fair question what is the correct abstraction level. And there's a reason why IPC has been of the last interfaces to be moved under generic Zephyr interfaces -- it's not an easy case.

Looking at this from SOF point of view. We have two protocols (IPC3 and IPC4) and dozens of supported hardware targets. IPC3/4 have differences. Both have 1-2 header words (that may be transferred inband with the "control plane" hardware interface) and a message payload (typically put to a fixed shared memory). There are of course many ways to abstract this, but we do have established practice in abstracting this at "struct ipc_msg *msg" level and we have implemented this over dozens of different hw targets in SOF.

There's clear common logic in SOF to send and receive IPC messages. Whether small messages can be sent inband (like IPC4 on Intel DSPs), it's an implementation detail that can be hidden in the driver (like we did in SOF IPC3). During the last few years, we've systematically moved code that varies from hw to another, to Zephyr drivers. Single interface, multiple drivers. In order to do this, we need a common interface for SOF IPC in Zephyr that can be implemented by different hw targets. For this to work, the interface has to be something that is generic enough. In Zephyr, the IPC service framework is the closest matching common interface used by different vendors for this (to send IPC messages between processors). Our SOF IPC3/4 has its quirks, but it seems it is possible to map it over the the generic interface in use.

Currently, for IPC, Intel uses src/ipc/ipc-zephyr.c file which directly talks to a Intel ADSP driver interface of one particular hardware type. Other vendors have IPC drivers implemented on SOF side, using old XTOS IPC driver interface.

We can of course reverse the course and keep the IPC abstraction on SOF side (e.g. move src/ipc/ipc-zephyr.c under sof/src/platform/intel/). This would be first such interface to be kept at SOF, so quite a big change, but certainly something to discuss now. I've been working towards https://github.com/thesofproject/sof/issues/5794 as our common architectural target and e.g. asked @dcpleung to do this work on Zephyr side.

kv2019i avatar Nov 06 '25 08:11 kv2019i

Currently, for IPC, Intel uses src/ipc/ipc-zephyr.c file which directly talks to a Intel ADSP driver interface of one particular hardware type.

I think this implementation doesn't address this problem. How the len parameter is used stems from the hardware architecture, and this driver is still Intel-specific, just less readable. I think we could move most of this logic to a Zephyr IPC backend and use the Intel IPC driver within it. Then here we could utilize the generic Zephyr API that could work with multiple backends (specific to each vendor). Below is a draft diagram illustrating roughly what I mean:

sequenceDiagram
    autonumber
    
    participant HOST as HOST Driver
    participant DRV as Intel ADSP<br/>IPC Driver<br/>(Backend)
    participant SVC as Zephyr<br/>IPC Service
    participant SOF as SOF IPC<br/>(ipc-zephyr.c)
    participant WQ1 as Work Queue 1<br/>(ipc_do_cmd)
    participant WQ2 as Work Queue 2<br/>(ipc_send_response)
    
    Note over HOST,WQ2: HOST sends IPC Command to DSP
    
    HOST->>DRV: Set TDR BUSY + command data: intel_adsp_ipc_isr()
    Note right of HOST: Write to TDR/TDD registers
    activate DRV
    Note right of DRV: ISR Context
    DRV->>SOF: handle_message() schedule IPC process
    activate SOF
    SOF->>SOF: ipc_schedule_process()
    deactivate SOF
    Note right of DRV: ACK to HOST:<br/>DSP received message<br/>(ends interrupt)
    DRV->>HOST: set TDR BUSY
    deactivate DRV

    Note over WQ1: Worker 1: Process Command
    activate WQ1
    WQ1->>SOF: ipc_platform_do_cmd()
    activate SOF
    activate SVC
    SOF->>SVC: cfg.received(data, len, priv)
    Note right of SOF: Backend calls<br/>registered callback
    SVC->>SOF: return data
    deactivate SVC
    SOF->>SOF: ipc_cmd(hdr)
    SOF->>SOF: ipc_prepare_to_send()
    SOF->>SOF: ipc_schedule_process()
    Note right of SOF: Schedule work queue task
    activate SVC
    SOF->>SVC: backend->release_rx_buffer(token, data)
    Note right of SVC: Signal processing complete
    SVC->>DRV: intel_adsp_ipc_complete()
    deactivate SVC
    deactivate SOF
    deactivate WQ1

    Note over WQ2: Worker 2: Send Response
    activate WQ2
    Note right of WQ2: Send response<br/>using IPC service API
    WQ2->>SOF: ipc_platform_send_msg()
    activate SOF
    SOF->>SVC: ipc_service_send(&ept, response, len)
    SVC->>DRV: intel_adsp_ipc_send_message()
    DRV->>HOST: Set IDR BUSY + response data
    Note right of DRV: Write to IDR/IDD<br/>Trigger HOST interrupt
    deactivate SOF
    deactivate WQ2
    
    DRV->>HOST: Interrupt (IDR BUSY set)
    activate HOST
    HOST->>HOST: Read IDR/IDD registers
    HOST->>HOST: Process response
    HOST->>DRV: Set IDA DONE bit
    Note right of HOST: ACK response received
    deactivate HOST

    Note over DRV: ACK received
    activate DRV
    Note right of DRV: intel_adsp_ipc_isr()
    DRV->>DRV: Check IDA DONE bit
    DRV->>DRV: Set IDA DONE bit
    deactivate DRV

We could even attempt to move the logic with memory window handling to the backend.

tmleman avatar Nov 06 '25 14:11 tmleman

@tmleman wrote:

I think this implementation doesn't address this problem. How the len parameter is used stems from the hardware architecture, and this driver is still Intel-specific, just less readable.

Thanks, the diagram is very useful. I see we are looking at this somewhat differently. I'm still not fully understanding the problem with 'len'. More specifically, while your diagram has it, we don't have such parameter in existing implementation. And I don't think this is feature of (Intel) hardware but a SW/FW protocol choice that applies to all SOF DSPs. Both IPC3 and IPC4 protocols have no concept of "length" for received messages. Basicly a fixed amount of data (SOF_IPC_MSG_MAX_SIZE) is expected to in the (optional) payload buffer, and it's upto the protocol handlers (ipc3/4-handler) to parse the message.

There was an early attempt (before IPC4) to migrate SOF to a TLV based IPC protocol which would have allowed common code to understand the size of the IPC messages (not a full OSI split, but at least more like in TCP/IP or OSI, where network code can understand size of the messages even if doesn't understand the protocol). Thta did nothappen and what we have now, this is not a Intel hw quirk, or IPC4 quirk, but is there for IPC3 as well and all hw targets SOF supports. The hw specific driver will use some hardware mechanism (ISR, doorbell registers) to wake up on reception of new data, provide upto 64bit of data inline and then a build-time defined set of bytes (SOF_IPC_MSG_MAX_SIZE) will be available to the DSP. Then IPC3/4 protocol code is called to parse the (optional) payload and figure out what data is there.

I don't think this PR changes this and I'd assume all hardware that currently has XTOS/IPC3 drivers, could use ipc-zephyr.c code after this PR.

kv2019i avatar Nov 07 '25 11:11 kv2019i

I'm still not fully understanding the problem with 'len'.

@param[in] len Number of bytes to send.

I don't think this PR changes this and I'd assume all hardware that currently has XTOS/IPC3 drivers, could use ipc-zephyr.c code after this PR.

With values like INTEL_ADSP_IPC_SEND_DONE?

The IPC service API doesn't meet our needs at all, why don't we modify it? Why don't we add a new one that would be more useful?

tmleman avatar Nov 07 '25 13:11 tmleman

@tmleman wrote:

I'm still not fully understanding the problem with 'len'. @param[in] len Number of bytes to send.

Aa, that 'len'. This has nothing to do with Intel DSP, but is a property of the Zephyr IPC framework.

But right, the naming is confusing, "enum intel_adsp_cb_len" should be something generic "enum ipc_adsp_cb_len". This will be same for all SOF IPC backends and not specific to Intel. This is just used to map SOF style IPC to the Zephyr IPC framework and 'len' is used to make the mapping. This will be same for all DSP hardware and not vendor specific.

The IPC service API doesn't meet our needs at all, why don't we modify it? Why don't we add a new one that would be more useful?

We tried the new one approach first. Daniel implemented, but it was rejected in Zephyr architecture review. Full details at https://github.com/zephyrproject-rtos/zephyr/pull/91606

.. so the IPC service based approach (https://github.com/zephyrproject-rtos/zephyr/pull/96233) is following feedback from Zephyr architecture forum. We of course have the option to not use this, and expose driver-specific interfaces instead and make the abstractions on SOF side (i.e. make ipc-zephyr.c -> ipc-zephyr-intel-adsp.c ).

kv2019i avatar Nov 07 '25 14:11 kv2019i

@kv2019i wrote:

We of course have the option to not use this, and expose driver-specific interfaces instead and make the abstractions on SOF side (i.e. make ipc-zephyr.c -> ipc-zephyr-intel-adsp.c ).

I think that would be a better approach.

tmleman avatar Nov 07 '25 15:11 tmleman

I'm still in favor of this PR. This allows to use existing infra to make new backends in Zephyr, and not invest in making new infrastructure on SOF side to handle multiple IPC hw implementations. But let's give a chance to other reviewers to respond and weigh in.

kv2019i avatar Nov 07 '25 15:11 kv2019i

This change seems unnecessary. We are forcing the use of the Zephyr interface, which does not fit our needs. To achieve this, we misuse it by passing the operation type as message length. This approach does not provide any real benefit from using the existing interface. We cannot swap in another driver anyway, because only ours will be incompatible in correct way to work.

I agree with Tomek that sof currently uses two independent hardware for ipc (ipc and memory window), and merging them into a single driver is not a good idea. I would keep ipc handling in sof as a separate component. This way, if someone wants to implement ipc differently (UART, Ethernet, etc.) they can disable intel-zephyr-ipc in the config and enable their own ipc implementation exposing the same interface.

softwarecki avatar Nov 12 '25 15:11 softwarecki