Mellanox 100G interface segmentation fault at the receiver side
Hi all,
By following all the issues and docs I managed to get some 100G mellanox (MT28800) apparently work. I installed the OFED LINUX 4.9.2.2.4 drivers and built moongen with the --mlx5 flag as far as I can tell I did not see any error.
I tested the sending side with the libmoon/examples/pktgen.lua script and I manage to get 100gbps sending with 2 ports at the same time and quite small packet sizes.
However, the problem comes when I add a receiver. It always segment faults no matter what I put in the receiving code. I noticed that it receives some packets and thus I made a program to count them, or to see if there was something with the packet content:
function recv(queue, id)
print("Listening....")
local bufs = memory.bufArray()
local count = 0
while mg.running() do
local rx = queue:recv(bufs)
for i = 1, rx do
count = count + 1
local pkt = bufs[i]:getUdp4Packet()
print(pkt.eth:getString() .. " " .. count)
end
bufs:freeAll()
end
print(id .. " Total Packets Received: " .. count)
end
I observed that after receiving the amount of rxDescs packets it crashes with segmentation fault. For example with a rxdescs of 512 the program above prints:
...
ETH 98:03:9b:4d:d7:9c > 98:03:9b:4d:d7:9d type 0x0800 (IP4) 503
ETH 98:03:9b:4d:d7:9c > 98:03:9b:4d:d7:9d type 0x0800 (IP4) 504
ETH 98:03:9b:4d:d7:9c > 98:03:9b:4d:d7:9d type 0x0800 (IP4) 505
ETH 98:03:9b:4d:d7:9c > 98:03:9b:4d:d7:9d type 0x0800 (IP4) 506
ETH 98:03:9b:4d:d7:9c > 98:03:9b:4d:d7:9d type 0x0800 (IP4) 507
ETH 98:03:9b:4d:d7:9c > 98:03:9b:4d:d7:9d type 0x0800 (IP4) 508
ETH 98:03:9b:4d:d7:9c > 98:03:9b:4d:d7:9d type 0x0800 (IP4) 509
ETH 98:03:9b:4d:d7:9c > 98:03:9b:4d:d7:9d type 0x0800 (IP4) 510
ETH 98:03:9b:4d:d7:9c > 98:03:9b:4d:d7:9d type 0x0800 (IP4) 511
ETH 98:03:9b:4d:d7:9c > 98:03:9b:4d:d7:9d type 0x0800 (IP4) 512
Segmentation fault
Or if i change it to lets say 4096:
...
ETH 98:03:9b:4d:d7:9c > 98:03:9b:4d:d7:9d type 0x0800 (IP4) 4088
ETH 98:03:9b:4d:d7:9c > 98:03:9b:4d:d7:9d type 0x0800 (IP4) 4089
ETH 98:03:9b:4d:d7:9c > 98:03:9b:4d:d7:9d type 0x0800 (IP4) 4090
ETH 98:03:9b:4d:d7:9c > 98:03:9b:4d:d7:9d type 0x0800 (IP4) 4091
ETH 98:03:9b:4d:d7:9c > 98:03:9b:4d:d7:9d type 0x0800 (IP4) 4092
ETH 98:03:9b:4d:d7:9c > 98:03:9b:4d:d7:9d type 0x0800 (IP4) 4093
ETH 98:03:9b:4d:d7:9c > 98:03:9b:4d:d7:9d type 0x0800 (IP4) 4094
ETH 98:03:9b:4d:d7:9c > 98:03:9b:4d:d7:9d type 0x0800 (IP4) 4095
Segmentation fault
Any idea of what could the problem be?