axi icon indicating copy to clipboard operation
axi copied to clipboard

axi_dw_downsizer: Fix `i_forward_b_beats_queue` underflow

Open colluca opened this issue 2 years ago • 4 comments

The forward_b_beat_push signal should be asserted every time the last W handshake in a burst occurs on the master port. However, pushing to the FIFO should also be conditional on forward_b_beat_full not being asserted.

The way this second condition was ensured allowed the W handshake to occur, without tracking it through a push to the queue. The corresponding B handshake would then trigger a pop, causing an underflow in the FIFO counter.

This PR corrects this behaviour, by stalling the W handshake until the FIFO is no longer full, so it is ensured that pushing to the FIFO is possible.

  • [ ] Revert first commit after review and wait for CI to pass (see comment below)

colluca avatar Oct 03 '23 10:10 colluca

The first commit modifies the testbench to replicate the error originally found in Occamy.

You can see from the Gitlab CI that the first commit fails with a Fatal: Trying to pop data although the FIFO is empty error. The second corrects this behaviour (CI times out because I incremented the number of tests, but the tests are passing).

I would revert the first commit and then I believe this should be ready to merge. If the MIN_WAIT_CYCLES and MAX_WAIT_CYCLES parameters could be changed at run time then we could keep some tests also in the modified conditions, but this should probably be addressed in a separate PR anyways.

colluca avatar Oct 11 '23 07:10 colluca

I think it would be valuable to ensure this corner case is triggered in the future to prevent this fix from being removed, so it would be great if we could figure out a way to keep this task triggering the error as part of the CI. Could we trigger this with a few individual transactions? Other than this, I did not look at the details yet and am not super familiar with the module, but I will look through it properly.

micprog avatar Feb 16 '24 09:02 micprog

Thanks Michael :)

It's been a long time so I have to familiarize myself with this again, but I'll look into options to keep it triggered. Probably have some notes in an old WR, but I remember it not being trivial. I'll let you know ASAP.

colluca avatar Feb 16 '24 09:02 colluca

@micprog I've looked into this again, but it's not trivial to replicate the error in a manner which can be consistently tested by the CI.

At this point I think we should just merge this accepting the risk you highlighted. If you give me green light, I will drop the first commit which was used to reproduce the error, such that the CI does not timeout. Then we should be able to merge.

colluca avatar Jun 07 '24 12:06 colluca

@colluca Sorry for the delay. I see what you mean. I rebased the branch on master, let's make sure the current CI is passing with this fix.

I also looked through the fix a bit more and understand what it aims to achieve. While it achieves what you are looking for, there may be some performance gain to be had by not only asserting w_valid when the forward_b_fifo is not full, but also making this conditional on the last bit of the corresponding W beat, as the B will only be returned once the last W beat is sent. This adjustment would allow almost all W beats to already be sent downstream for processing, only holding back the one that will trigger the B response that breaks the FIFO.

--- a/src/axi_dw_downsizer.sv
+++ b/src/axi_dw_downsizer.sv
@@ -736,7 +736,7 @@ module axi_dw_downsizer #(
             automatic addr_t slv_port_offset = AxiSlvPortStrbWidth == 1 ? '0 : w_req_q.aw.addr[idx_width(AxiSlvPortStrbWidth)-1:0];
 
             // Valid output
-            mst_req.w_valid = !forward_b_beat_full;
+            mst_req.w_valid = !(forward_b_beat_full && w_req_q.aw.len == 0);
             mst_req.w.last  = w_req_q.aw.len == 0;
             mst_req.w.user  = slv_req_i.w.user   ;

micprog avatar Jul 17 '24 09:07 micprog

@micprog what you describe sounds correct to me, and the specific expression in the RTL looks correct as well. Good catch :)

colluca avatar Jul 17 '24 20:07 colluca

@colluca I found a way to force the error by forcing significant stalls on the B channel. Please check out the testbench modification and let me know if it makes sense to you.

micprog avatar Jul 18 '24 13:07 micprog

I will resort the commits a bit to update the TB appropriately and rebase on master

micprog avatar Jul 18 '24 14:07 micprog