Random failures?
Hi,
we have implemented OpenHBMC on a custom board, it seemed to work well, the memory test passed, passed, passed...
but I did run the memory test manually a few more times, and it seems that about one out of ten the 32-bit test is failing, is this maybe the KNOWN BUG with an initial transaction that fails?
UPDATE: sometimes the 16 bit test fails, so it is not the first word issue. Confirmed 1 out of 10 is stable fail either 32 or 16 bit test. 8 bit test seems to pass always, not seen 8 bit fail yet.
UPDATE2: the FAILING design had axi bus clock 100MHz, changing the axi clock to 75 MHz seems to fix the problem. So the issue only appears when axi bus clock equals hyperram clock!
UDPATE3: with 75 MHz axi clock the issue is less frequent, but it still happens so there is a bug with AXI
UPDATE4: with 81.81818 axi clock it seems to be less failures.
STATUS: 32 and 16 bit memory tests fail if executed in a loop, there seems to be relation to axi bus clock, with 100MHz most failures, with 75MHz less and 81.81818 even less failures
but the failure rate is way too high for the IP core to be useful, this is bad bug!
UPDATE 5: it seems the issue is also related to BUFG clocking mode, we changed the FAILING design to use BUFIO/BUFR clocking and now we do not see issues, the memory tests are executed in infinite loop without fail, also with 100MHz axi clock, no failures
The loop was running 4+ days without failures, so the BUFIO mode works well!
Hello, AnttiLukats!
I'm back. Going to spent some time to fix all found issues. Thanks a lot for your feedback!
Best regards!
but I did run the memory test manually a few more times, and it seems that about one out of ten the 32-bit test is failing, is this maybe the KNOWN BUG with an initial transaction that fails?
Do you mean this bug #8? Do you have pull-down resistor at RWDS line on you PCB? If no, that can be a problem.
UPDATE 5:
Glad to hear that at least BUFIO/BUFR mode is working stable. I will look through BUFG mode ones more...
Could you also please tell me a bit more about the nature of test failures? Is it a data corruption or AXI hangs? Or both? Or mixed? I will be glad to see test logs if you have it.
I converted to BUFR version so cant not test the BUFG at this time, but it was failing rather quick, so if you make the memorytest to STOP on error and execute in loop you should be able to catch this yourself very easily. In the case you can not reproduce it, we would be happy to send you Spartan-7 hardware where the issue can be reproduced!
https://shop.trenz-electronic.de/de/CR00107-01-CRUVI-carrier-board-with-AMD-Spartan-7
This is the board we are using. There is no pulldown on RWDS. I added FPGA pulldown on RWDS but that did not change anything.
As this failure only happens with the BUFG version I am sure it is related to Xilinx FIFO's and not the HyperRAM interface.
The failure was memorytest fail, usually on 8 bit mode, sometimes in 16 bit mode, did not once see 32 bit fail so far.
In the case you can not reproduce it, we would be happy to send you Spartan-7 hardware where the issue can be reproduced!
Thanks a lot! I already have two different boards with HyperRAMs and will try to reproduce errors with them. I will make long run test, like you did. Also if you share schematics, I could make a bitstream for your board, so you will be able to run memory test with logs coming out of the FT2232.