ipc: rework Zephyr IPC interfaces
This reworks Zephyr IPC interface to utilize the newly introduced IPC message service, which provides a more generic IPC interface.
Hmm, on MTL/ace15 test results look good, but on LNL/PTL, DSP panics on first IPC sent. Not sure immediately what can explain this, the code should be the same for all these platforms (aside some missing thing in DT definitions). The app/prj.conf is shared for all.
[ 282.845982] kernel: snd_sof:snd_sof_run_firmware: sof-audio-pci-intel-lnl 0000:00:1f.3: booting DSP firmware
[ 282.845988] kernel: snd_sof_intel_hda_common:hda_dsp_cl_boot_firmware: sof-audio-pci-intel-lnl 0000:00:1f.3: IMR restore supported, booting from IMR directly
[ 282.847178] kernel: snd_sof_pci_intel_mtl:mtl_dsp_core_power_up: sof-audio-pci-intel-lnl 0000:00:1f.3: FW Poll Status: reg[0x178d04]=0x2000101 successful
[ 282.847182] kernel: snd_sof_pci_intel_mtl:mtl_dsp_cl_init: sof-audio-pci-intel-lnl 0000:00:1f.3: Primary core power up successful
[ 282.847186] kernel: snd_sof_pci_intel_mtl:mtl_dsp_cl_init: sof-audio-pci-intel-lnl 0000:00:1f.3: FW Poll Status: reg[0x73214]=0x80000000 successful
[ 282.847206] kernel: snd_sof_pci_intel_mtl:mtl_enable_interrupts: sof-audio-pci-intel-lnl 0000:00:1f.3: FW Poll Status: reg[0x1800]=0x41 successful
[ 282.847222] kernel: snd_sof_pci_intel_mtl:mtl_enable_interrupts: sof-audio-pci-intel-lnl 0000:00:1f.3: FW Poll Status: reg[0x1140]=0x1 successful
[ 282.847239] kernel: snd_sof_pci_intel_mtl:mtl_dsp_cl_init: sof-audio-pci-intel-lnl 0000:00:1f.3: FW Poll Status: reg[0x160200]=0x50000005 successful
[ 282.960273] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc rx : 0x1b080000|0x0: GLB_NOTIFICATION|FW_READY
[ 282.960286] kernel: snd_sof:sof_set_fw_state: sof-audio-pci-intel-lnl 0000:00:1f.3: fw_state change: 3 -> 6
[ 282.960295] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc rx done : 0x1b080000|0x0: GLB_NOTIFICATION|FW_READY
[ 282.960569] kernel: snd_sof:snd_sof_run_firmware: sof-audio-pci-intel-lnl 0000:00:1f.3: firmware boot complete
[ 282.960588] kernel: snd_sof:sof_set_fw_state: sof-audio-pci-intel-lnl 0000:00:1f.3: fw_state change: 6 -> 7
[ 282.960616] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx : 0x44000000|0x31400008: MOD_LARGE_CONFIG_SET [data size: 8]
[ 282.960796] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc rx : 0x1b0a0000|0x0: GLB_NOTIFICATION|EXCEPTION_CAUGHT
[ 282.960806] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: ------------[ DSP dump start ]------------
@kv2019i do you know if baseFW is the destination for this large module set ? On teh positive side the exception handler is managing to send the exception IPC in part..
From my limited debugging, it seems like on PTL, the busy bit is set after sending the first message, which prevents any more messages to be sent.
the respective Zephyr PR https://github.com/zephyrproject-rtos/zephyr/pull/91606
I tested it a bit more. It seems like the exception comes when requesting D0->D3 transition more than one time. The test showed that the first D0->D3->D0 worked fine and FW was ready again. However, the second D0->D3 request generated the exception IPC.
The issue with D0->D3->D0 is fixed so suspend/resume should be working on LNL and PTL.
likely unrelated to this PR: a pause-resume failure on PTL similar to the one observed in other PRs. https://sof-ci.01.org/sofpr/PR10089/build14052/devicetest/index.html?model=PTLH_RVP_NOCODEC&testcase=multiple-pause-resume-50 . I'll retrigger tests to see if we can reproduce it
SOFCI TEST
Adding @kv2019i
Zephyr side PR has been merged.
FYI @dcpleung . Zephyr version in SOF was updated in https://github.com/thesofproject/sof/pull/10268 , but this does not pull in https://github.com/zephyrproject-rtos/zephyr/pull/96233 yet. There seems to be regressions with newer Zephyr, so the update was delayed (see discussion in #10268 ).
Rebasing this trigger CI.
@dcpleung Please mark as "ready for review" to get more review attention: I think this is ready to go, I'm already using this as a basis for my #10346
@tmleman wrote:
Why do we need to do this? What drives this strong need to implement the IPC service in this form?
This really is to complete the Zephyr transition. I.e. https://github.com/thesofproject/sof/issues/5794 . The current code has two big problems:
- Intel SOF/Zephyr builds use an Intel specific IPC driver interface in Zephyr. This is not good, we should be using a IPC driver interface that can be implemened by multiple drivers for different hardware backends (just like we do for DMA, DAI, timers, etc). This is what this PR does.
- The original IPC Zephyr driver for Intel ADSP did not cover the full IPC interface (unlike we had in IPC3 in SOF). To really clean up (1), we need to expand the interface to cover the full IPC send/receive. I'm doing this on top of this PR in https://github.com/thesofproject/sof/pull/10346
Ok, I see how using a more generic interface is beneficial and simply the right approach. However, this interface doesn't fit our hardware and checking a checkbox is a weak justification.
The original IPC Zephyr driver for Intel ADSP did not cover the full IPC interface
What exactly do you mean?
@tmleman wrote:
Ok, I see how using a more generic interface is beneficial and simply the right approach. However, this interface doesn't fit our hardware and checking a checkbox is a weak justification.
To me this is fixing technical debt we have added. Prior to the Zephyr transaction, SOF had a IPC interface where common and platform logic was separated. With the Zephyr transition, we broke this and ended up having platform specific code end up to common SOF code (especially ipc-zephyr.c). This is now fixing that technical debt.
To me, we are not losing anything in when we use this interface with e.g. Intel DSP hardware.
The original IPC Zephyr driver for Intel ADSP did not cover the full IPC interface What exactly do you mean?
In short, in around v2.2, ipc_platform_send_msg() sent the whole message (header + payload) and platform drivers implemented this in different ways (on different hardware). The intel_adsp_ipc.h interface that was added to Zephyr (and ipc-zephyr.c to SOF), did not expose the functionality of this interface. The intel_adsp_ipc.h only covers the IPC header (pri+ext words) and the users of intel_adsp_ipc.h need to send/receive the IPC payload via some other interface. With this PR (and my #10346 ), I want to return us to the v2.2 level of semantics, so we have a clear split of IPC common and hw specific code. And as we now have drivers in Zephyr, the hw specific part should be in Zephyr drivers (and the IPC service framework backends provide a framework we can use for this without inventing something new).
@kv2019i wrote:
ipc_platform_send_msg() sent the whole message (header + payload) and platform drivers implemented this in different ways (on different hardware).
It seems to me that you are forcibly connecting two different concepts. IPC as a protocol and IPC as a HW feature. If we are not planning to migrate the entire SOF code to Zephyr, then leaving the protocol logic on the SOF side is not necessarily an error. Our specific Intel IPC protocol utilizes two separate HW features: IPC and memory window. Using the memory window is optional. The IPC driver we had in Zephyr almost entirely covered the functionality of this IP. Nothing I read here convinces me that this is how it should look.
@tmleman wrote:
ipc_platform_send_msg() sent the whole message (header + payload) and platform drivers implemented this in different ways (on different hardware).
It seems to me that you are forcibly connecting two different concepts. IPC as a protocol and IPC as a HW feature. If we are not planning to migrate the entire SOF code to Zephyr, then leaving the protocol logic on the SOF side is not necessarily an error. Our specific Intel IPC protocol utilizes two separate HW features: IPC and memory window. Using the memory window is optional. The IPC driver we had in Zephyr almost entirely covered the functionality of this IP. Nothing I read here convinces me that this is how it should look.
It's a fair question what is the correct abstraction level. And there's a reason why IPC has been of the last interfaces to be moved under generic Zephyr interfaces -- it's not an easy case.
Looking at this from SOF point of view. We have two protocols (IPC3 and IPC4) and dozens of supported hardware targets. IPC3/4 have differences. Both have 1-2 header words (that may be transferred inband with the "control plane" hardware interface) and a message payload (typically put to a fixed shared memory). There are of course many ways to abstract this, but we do have established practice in abstracting this at "struct ipc_msg *msg" level and we have implemented this over dozens of different hw targets in SOF.
There's clear common logic in SOF to send and receive IPC messages. Whether small messages can be sent inband (like IPC4 on Intel DSPs), it's an implementation detail that can be hidden in the driver (like we did in SOF IPC3). During the last few years, we've systematically moved code that varies from hw to another, to Zephyr drivers. Single interface, multiple drivers. In order to do this, we need a common interface for SOF IPC in Zephyr that can be implemented by different hw targets. For this to work, the interface has to be something that is generic enough. In Zephyr, the IPC service framework is the closest matching common interface used by different vendors for this (to send IPC messages between processors). Our SOF IPC3/4 has its quirks, but it seems it is possible to map it over the the generic interface in use.
Currently, for IPC, Intel uses src/ipc/ipc-zephyr.c file which directly talks to a Intel ADSP driver interface of one particular hardware type. Other vendors have IPC drivers implemented on SOF side, using old XTOS IPC driver interface.
We can of course reverse the course and keep the IPC abstraction on SOF side (e.g. move src/ipc/ipc-zephyr.c under sof/src/platform/intel/). This would be first such interface to be kept at SOF, so quite a big change, but certainly something to discuss now. I've been working towards https://github.com/thesofproject/sof/issues/5794 as our common architectural target and e.g. asked @dcpleung to do this work on Zephyr side.
Currently, for IPC, Intel uses src/ipc/ipc-zephyr.c file which directly talks to a Intel ADSP driver interface of one particular hardware type.
I think this implementation doesn't address this problem. How the len parameter is used stems from the hardware architecture, and this driver is still Intel-specific, just less readable. I think we could move most of this logic to a Zephyr IPC backend and use the Intel IPC driver within it. Then here we could utilize the generic Zephyr API that could work with multiple backends (specific to each vendor). Below is a draft diagram illustrating roughly what I mean:
sequenceDiagram
autonumber
participant HOST as HOST Driver
participant DRV as Intel ADSP<br/>IPC Driver<br/>(Backend)
participant SVC as Zephyr<br/>IPC Service
participant SOF as SOF IPC<br/>(ipc-zephyr.c)
participant WQ1 as Work Queue 1<br/>(ipc_do_cmd)
participant WQ2 as Work Queue 2<br/>(ipc_send_response)
Note over HOST,WQ2: HOST sends IPC Command to DSP
HOST->>DRV: Set TDR BUSY + command data: intel_adsp_ipc_isr()
Note right of HOST: Write to TDR/TDD registers
activate DRV
Note right of DRV: ISR Context
DRV->>SOF: handle_message() schedule IPC process
activate SOF
SOF->>SOF: ipc_schedule_process()
deactivate SOF
Note right of DRV: ACK to HOST:<br/>DSP received message<br/>(ends interrupt)
DRV->>HOST: set TDR BUSY
deactivate DRV
Note over WQ1: Worker 1: Process Command
activate WQ1
WQ1->>SOF: ipc_platform_do_cmd()
activate SOF
activate SVC
SOF->>SVC: cfg.received(data, len, priv)
Note right of SOF: Backend calls<br/>registered callback
SVC->>SOF: return data
deactivate SVC
SOF->>SOF: ipc_cmd(hdr)
SOF->>SOF: ipc_prepare_to_send()
SOF->>SOF: ipc_schedule_process()
Note right of SOF: Schedule work queue task
activate SVC
SOF->>SVC: backend->release_rx_buffer(token, data)
Note right of SVC: Signal processing complete
SVC->>DRV: intel_adsp_ipc_complete()
deactivate SVC
deactivate SOF
deactivate WQ1
Note over WQ2: Worker 2: Send Response
activate WQ2
Note right of WQ2: Send response<br/>using IPC service API
WQ2->>SOF: ipc_platform_send_msg()
activate SOF
SOF->>SVC: ipc_service_send(&ept, response, len)
SVC->>DRV: intel_adsp_ipc_send_message()
DRV->>HOST: Set IDR BUSY + response data
Note right of DRV: Write to IDR/IDD<br/>Trigger HOST interrupt
deactivate SOF
deactivate WQ2
DRV->>HOST: Interrupt (IDR BUSY set)
activate HOST
HOST->>HOST: Read IDR/IDD registers
HOST->>HOST: Process response
HOST->>DRV: Set IDA DONE bit
Note right of HOST: ACK response received
deactivate HOST
Note over DRV: ACK received
activate DRV
Note right of DRV: intel_adsp_ipc_isr()
DRV->>DRV: Check IDA DONE bit
DRV->>DRV: Set IDA DONE bit
deactivate DRV
We could even attempt to move the logic with memory window handling to the backend.
@tmleman wrote:
I think this implementation doesn't address this problem. How the
lenparameter is used stems from the hardware architecture, and this driver is still Intel-specific, just less readable.
Thanks, the diagram is very useful. I see we are looking at this somewhat differently. I'm still not fully understanding the problem with 'len'. More specifically, while your diagram has it, we don't have such parameter in existing implementation. And I don't think this is feature of (Intel) hardware but a SW/FW protocol choice that applies to all SOF DSPs. Both IPC3 and IPC4 protocols have no concept of "length" for received messages. Basicly a fixed amount of data (SOF_IPC_MSG_MAX_SIZE) is expected to in the (optional) payload buffer, and it's upto the protocol handlers (ipc3/4-handler) to parse the message.
There was an early attempt (before IPC4) to migrate SOF to a TLV based IPC protocol which would have allowed common code to understand the size of the IPC messages (not a full OSI split, but at least more like in TCP/IP or OSI, where network code can understand size of the messages even if doesn't understand the protocol). Thta did nothappen and what we have now, this is not a Intel hw quirk, or IPC4 quirk, but is there for IPC3 as well and all hw targets SOF supports. The hw specific driver will use some hardware mechanism (ISR, doorbell registers) to wake up on reception of new data, provide upto 64bit of data inline and then a build-time defined set of bytes (SOF_IPC_MSG_MAX_SIZE) will be available to the DSP. Then IPC3/4 protocol code is called to parse the (optional) payload and figure out what data is there.
I don't think this PR changes this and I'd assume all hardware that currently has XTOS/IPC3 drivers, could use ipc-zephyr.c code after this PR.
I'm still not fully understanding the problem with 'len'.
@param[in] len Number of bytes to send.
I don't think this PR changes this and I'd assume all hardware that currently has XTOS/IPC3 drivers, could use ipc-zephyr.c code after this PR.
With values like INTEL_ADSP_IPC_SEND_DONE?
The IPC service API doesn't meet our needs at all, why don't we modify it? Why don't we add a new one that would be more useful?
@tmleman wrote:
I'm still not fully understanding the problem with 'len'. @param[in] len Number of bytes to send.
Aa, that 'len'. This has nothing to do with Intel DSP, but is a property of the Zephyr IPC framework.
But right, the naming is confusing, "enum intel_adsp_cb_len" should be something generic "enum ipc_adsp_cb_len". This will be same for all SOF IPC backends and not specific to Intel. This is just used to map SOF style IPC to the Zephyr IPC framework and 'len' is used to make the mapping. This will be same for all DSP hardware and not vendor specific.
The IPC service API doesn't meet our needs at all, why don't we modify it? Why don't we add a new one that would be more useful?
We tried the new one approach first. Daniel implemented, but it was rejected in Zephyr architecture review. Full details at https://github.com/zephyrproject-rtos/zephyr/pull/91606
.. so the IPC service based approach (https://github.com/zephyrproject-rtos/zephyr/pull/96233) is following feedback from Zephyr architecture forum. We of course have the option to not use this, and expose driver-specific interfaces instead and make the abstractions on SOF side (i.e. make ipc-zephyr.c -> ipc-zephyr-intel-adsp.c ).
@kv2019i wrote:
We of course have the option to not use this, and expose driver-specific interfaces instead and make the abstractions on SOF side (i.e. make ipc-zephyr.c -> ipc-zephyr-intel-adsp.c ).
I think that would be a better approach.
I'm still in favor of this PR. This allows to use existing infra to make new backends in Zephyr, and not invest in making new infrastructure on SOF side to handle multiple IPC hw implementations. But let's give a chance to other reviewers to respond and weigh in.
This change seems unnecessary. We are forcing the use of the Zephyr interface, which does not fit our needs. To achieve this, we misuse it by passing the operation type as message length. This approach does not provide any real benefit from using the existing interface. We cannot swap in another driver anyway, because only ours will be incompatible in correct way to work.
I agree with Tomek that sof currently uses two independent hardware for ipc (ipc and memory window), and merging them into a single driver is not a good idea. I would keep ipc handling in sof as a separate component. This way, if someone wants to implement ipc differently (UART, Ethernet, etc.) they can disable intel-zephyr-ipc in the config and enable their own ipc implementation exposing the same interface.