libcsp icon indicating copy to clipboard operation
libcsp copied to clipboard

RDP and CRC32 dropping/duplicating packets

Open alegnani opened this issue 2 years ago • 7 comments

As the description says, when using both RDP and CRC32, packets get dropped. I tried using both RDP and CRC32 separately and they work fine by themselves.

The issue is encountered when sending packets, with a monotonically increasing counter as a message, over a connection that randomly corrupts bytes. Could it be that the ACK gets send when receiving the packet before it gets dropped due to a wrong CRC?

alegnani avatar Nov 16 '23 08:11 alegnani

Care to share how to reproduce?

yashi avatar Nov 20 '23 02:11 yashi

Sorry forgot to actually upload the code. csp_server.c and csp_client.c are replacements for the files in the example folder and can be built with python buildall.py. To run them I used: ./build/examples/csp_server -z localhost -a 2 and ./build/examples/csp_client -z localhost -a 1 -C 2. The proxy that introduces corruption can be run with python lossy_zmq_proxy.py.

I am observing that packets arrive duplicated at the server. Additionally I have observed that it sometimes skips packets (receiving nr.4 and then nr.6), thus leading me to believe that the ACK is sent before checking the CRC, which then drops the packet.

code.zip

alegnani avatar Nov 20 '23 09:11 alegnani

I can't even make it work with CSP_O_NONE. Can you?

yashi avatar Nov 20 '23 12:11 yashi

I uploaded the wrong file. It should be CSP_O_RDP | CSP_O_CRC32.

These are the tests I performed:

  • CSP_O_RDP, packet_loss = 0.1 and no corruption:
    • Sending increasing counter works
  • CSP_O_CRC32, no packet_loss and corruption = 0.02:
    • Sending the same message over and over does not arrive corrupted.
    • If the CRC32 is wrong it gets dropped
  • CSP_O_CRC32 | CSP_O_RDP, packet_loss = 0.1 and corruption = 0.02:
    • Sending increasing counter does not work
    • Sometimes a duplicated message arrives
    • Sometime a message gets skipped
    • Same thing happens with packet_loss = 0.0

alegnani avatar Nov 20 '23 14:11 alegnani

I don't know. I can't even reliably send the first packet with"HelloWorld:0" from csp_client to lossy_zmq_proxy.py using CSP_O_NONE. zmq_send() does not return any error. But I can't capture it with Wireshark nor lossy_zmq_proxy.py gets it.

I know you are not concerned about CSP_O_NONE but without making it work with CSP_O_NONE, I can't compare with other options.

	for (;;) {
		csp_packet_t * packet;
		while ((packet = csp_read(conn, 100)) != NULL) {
			if (csp_conn_dport(conn) == SERVER_PORT) {
				/* Process packet here */
				int recv;
				sscanf((char *)packet->data, "HelloWorld:%d", &recv);
				csp_print("Packet received on SERVER_PORT: %d\n", recv);
				csp_buffer_free(packet);
				if (recv != server_received) {
					csp_print("expected:%d, got: %d\n", server_received, recv);
					return;

Because csp_server.c return from the server task when a received value is not exected one, you will soon be OOM for any incoming packets, that is, call to csp_buffer_get() in csp_zmqhub_task() of csp_if_zmqhub.c fails.

void * csp_zmqhub_task(void * param) {

	zmq_driver_t * drv = param;
	csp_packet_t * packet;
	const uint32_t HEADER_SIZE = (csp_conf.version == 2) ? 6 : 4;

	while (1) {
		int ret;
		zmq_msg_t msg;

		ret = zmq_msg_init_size(&msg, sizeof(packet->data) + HEADER_SIZE);
		assert(ret == 0);

		// Receive data
		if (zmq_msg_recv(&msg, drv->subscriber, 0) < 0) {
			csp_print("ZMQ RX err %s: %s\n", drv->iface.name, zmq_strerror(zmq_errno()));
			continue;
		}

		unsigned int datalen = zmq_msg_size(&msg);
		if (datalen < HEADER_SIZE) {
			csp_print("ZMQ RX %s: Too short datalen: %u - expected min %u bytes\n", drv->iface.name, datalen, HEADER_SIZE);
			zmq_msg_close(&msg);
			continue;
		}

		// Create new csp packet
		packet = csp_buffer_get(datalen - HEADER_SIZE);
		if (packet == NULL) {
			csp_print("RX %s: Failed to get csp_buffer(%u)\n", drv->iface.name, datalen);
			zmq_msg_close(&msg);
			continue;

I don't know this is due to ZMQ or not. I'm not an expart of it. But I find followings:

http://api.zeromq.org/2-1:zmq-send

A successful invocation of zmq_send() does not indicate that the message has been transmitted to the network, only that it has been queued on the socket and ØMQ has assumed responsibility for the message.

http://api.zeromq.org/3-2:zmq-socket

When a ZMQ_PUB socket enters the mute state due to having reached the high water mark for a subscriber, then any messages that would be sent to the subscriber in question shall instead be dropped until the mute state ends.

yashi avatar Nov 21 '23 10:11 yashi

Thank you very much for investigating the issue. If this is only an issue with ZMQ then it's not an issue for me. This program was just to sanity check whether the communication is actually reliable or not, before using it on our cubesat. There should be no problems when using it with CAN right? (We will definitively test that later on.) If the point above is the case this issue can be marked as resolved.

alegnani avatar Nov 21 '23 14:11 alegnani

Thank you very much for investigating the issue. If this is only an issue with ZMQ then it's not an issue for me. This program was just to sanity check whether the communication is actually reliable or not, before using it on our cubesat. There should be no problems when using it with CAN right? (We will definitively test that later on.) If the point above is the case this issue can be marked as resolved.

Hi, Sorry, it's been a while, but I'd like to know if you defined CSP_21? https://github.com/libcsp/libcsp/blob/f33f496ca3635e59103599df0a95a0dcee9c201c/src/csp_crc32.c#L95 The CRC will consider both the message header and the data; otherwise, the CRC would only be effective on the data part and not on the header.

When I attempt to introduce corruption from the beginning of the data segment (without corrupting the message header), I don't encounter any issues with RDP + CRC. (CSP_21 is not define)

Edit: I saw a skipped message when running over the night.

moonlight83340 avatar May 01 '24 05:05 moonlight83340

Hi,

It appears the issue occurred because the server could potentially accept a corrupted message if the CRC32 flag from the client was also corrupted. However, we can configure the server to only accept messages with CRC32.

You can review my issue and the response here: https://github.com/libcsp/libcsp/issues/596

To implement this, you can add the following line: sock.opts = CSP_SO_CRC32REQ; Here : https://github.com/libcsp/libcsp/blob/develop/examples/csp_server.c#L43, and then run the example.

This setting will add a security check on the server side. I wasn't aware that we could use this option on the server side before.

I would also recommend defining CSP_21 so that CRC32 applies to the entire packet (header + data).

moonlight83340 avatar Jul 10 '24 03:07 moonlight83340