Serial-in not syncing the buffer
I am having a problem with a serial-in node. I have a 43 byte fixed-length packet that I need to receive into NR. The port is the uart on the Pi 3, ttyAMA0 (BT is disabled), at 19.2 baud. I do not have any character to use inside of the packet as the delimiter and I have no control over the machine that generates the packet, so I tried fixed-length and timeout, but both have issues. I set it up with timeout and set it to 100ms and about 5% of the packets are split in some random spot. All the data are there, just not split correctly. If I then change the node to a fixed-length config, at 43 bytes, it runs for days without fail, no missplit packets, no lost data, nothing. Perfect! Until I restart NR. After I restart NR the 43 bytes come in, but it's splitting them at the wrong byte. If I restart NR a few more times it will get to the correct byte and remain there until the next restart. Also, if I change the serial-in node back to a timeout of something like 100ms, deploy, then change it back to 43 bytes and deploy it will work perfectly every time I've tried.
So that's a long way of saying the splitting isn't working right for the fixed-length option.
Please ask questions in Slack or the Google Group Mailing list as described in the Contributing Guide and linked to from the Node-RED home page.
This is going to be very hard to make work.
I think you are just getting lucky when you start the system the first time that data blocks line up.
If there is no way to tell the start/end of the block from the block it's self there is no way to re-sync if you start reading from the serial port in the middle of a block.
Hi Ben. I asked him to raise the issue after a chat on the mailing list. Yes not trivial but thought we should capture it.
In this case I have a 43 byte packet that transmits in under 30ms, then there's about 70ms of inactivity, then it repeats. So it's repeating every 100ms. When (if) I transmit my packet back to the device I need to do it during that 70ms. I've seen a lot of equipment over the years with oddball serial protocols like this that have no delimiting character, they just repeat and wait for a response between transmissions.
OK, so it sounds like you have a 70% chance of success when restarting. I'm not sure it will be possible to get the low level timing information from the node-serial module, that the Node-RED node is built on top of, that will be needed to find the gaps.
A new low level native NodeJS module is probably going to be needed using something like this technique (https://stackoverflow.com/questions/2917881/how-to-implement-a-timeout-in-read-function-call) to read bytes from the serial port until it blocks, then set a 50ms timeout on the read. If we make it to the timeout then we can be pretty sure we are in a gap and the next read should block for 20ms then return the first byte of the next block of data.
If the timeout was made configurable then it might be possible to make this reusable for different devices.
70% sounds about right - except I can force it to 100% with a little fiddling around. If I set the serial-in node to timeout and give it 100ms, for example, deploy, then change it back to 43 characters fixed and deploy it will line up and stay that way every time. It then runs for days at a time without any errors or lost bytes. If I redeploy or restart NR I have to do that process over again. I'm still not sure this is going to work on the serial-out, however, when I get to that. I think there will need to be some kind of a short gap (10ms) between the unit's transmit and my reply according to the datasheet.
Hi @realsquash, did you ever manage to get this to work? Would you be willing to test a potential solution? My idea is as follows: right now the timeout is set only after receiving the first character. If I understand your use case correctly, you're looking for some "inter-char timeout" instead. We could add a flag (or a fourth split message option) to account for that. If you set that to e.g. 5ms, you should be good to go.
If you want to give it a quick try, you can just comment the if ( i === 1) condition on top of my pull request
Cunning plan. Could you actually add that to the node-config-input-out choices in the PR? So existing becomes timeout from first, and new is timeout from last received.
I will give this a try this coming weekend and let you know what happens!
Alright, I'll get this done by then. It's a trivial change in the end, might turn useful to others as well. I shall remember to double check a cap on the msg size is enforced
OK done... @dceejay do you want me to add it on top of the existing PR #426? Wouldn'it make more sense to treat it a separate topic after that one gets merged?
I'm good with it being part of this (now that I'm up to speed with it)
Hi @realsquash, did you get a chance to give it a try? Notice you should be able to install it using nmp, see @dceejay's comment
I didn't get a chance this weekend, but I'm heading out there tonight to give it a whirl.
I got a chance to install this and test over the past day or so. The new method works, but it does seem to lose track of the timing once in a while, which results in the buffer being split in the middle of the packet, and not always at the same byte. This averages once every 1-2 minutes and the CPU load is low, under 20% on average. Restarts do not affect the behavior.
There is also an issue when I deploy in NR - the serial in node will not receive any data unless I change a value in the config, such as the timeout, and deploy again. If I do something else and deploy then it will simply stop working unless I change a value before deploying.
Thank you for your feedback.
buffer splitting
Could you provide some more details?
- What value did you set for timeout?
- What is the serial interface (e.g. Rpi internal UART, USB-to-serial converter, etc?)
- What packet sizes do you get? Something like 43, 43, 35, 8, 43, 43... or 43, 43, 55, 31, 43...?
I was afraid using timing would not have been our best choice, though it came up for free so it was worth trying. If we can't figure this out, we might as well think about using both timing and packet length. Though I believe the right approach for such protocols would be to have a proper parser instead...
deployment
That pretty much sounds like a bug.
- What are you deploying: full, modified flows, modified nodes?
- Does it also occur when you use the old split-by-size method?
- In your flow, are you using "serial in" only? Or "serial out" as well? I guess you're not using serial request, right?
Hi @realsquash,
I think I was able to reproduce the issue you mentioned with deployment.
I'm using socat for testing on a Linux PC:
socat -d -d pty,raw,echo=0 pty,raw,echo=0
2018/07/12 23:15:52 socat[22128] N PTY is /dev/pts/14
2018/07/12 23:15:52 socat[22128] N PTY is /dev/pts/16
2018/07/12 23:15:52 socat[22128] N starting data transfer loop with FDs [5,5] and [7,7]
Then on a different shell I'm sending data:
while true; do dmesg > /dev/pts/16; done
And here is the flow:
[
{
"id": "7d51f35c.91c43c",
"type": "serial in",
"z": "2fe0ae98.59c74a",
"name": "",
"serial": "2f5a3cee.3d8f74",
"x": 120,
"y": 60,
"wires": [
[
"80a032ea.426458"
]
]
},
{
"id": "80a032ea.426458",
"type": "debug",
"z": "2fe0ae98.59c74a",
"name": "",
"active": true,
"tosidebar": true,
"console": false,
"tostatus": false,
"complete": "false",
"x": 290,
"y": 60,
"wires": []
},
{
"id": "2f5a3cee.3d8f74",
"type": "serial-port",
"z": "",then it
"serialport": "/dev/pts/14",
"serialbaud": "57600",
"databits": "8",
"parity": "none",
"stopbits": "1",
"newline": "1000",
"bin": "false",
"out": "interbyte",
"addchar": true,
"responsetimeout": "10000"
}
]
Notice here the silence timeout is exaggeratedly long. Essentially what I believe is happening is the internal buffer fills up without emitting any data, and this in turn slows things down so that the timeout will never trigger. I have a fix available.
@realsquash: On second thought, what I fixed is unlikely to be related to your use case. Since you're receiving 43 bytes every 100ms (i.e. 430 bytes/s), your deploy would have to take more than a minute for such amount of data to accumulate... I need more details to investigate this further.
Values for timeout don't affect the issue, but I have tried values between 20ms and 65ms. The interface is RPi internal UART (ttyAMA0) with an RS485 transceiver connected. Bluetooth is disabled. Packet sizes are like your example. No bytes are ever lost. Once in a while I will have a packet split into 3 or 4 parts, sometimes I'll see a packet not split for a few cycles (for example 172 bytes, so 43x4).
Based on my use case the timing method should be easy. I'm running 19.2k baud and sending 43 bytes, so I've got about 30ms of data transfer at the start of the transmission and then 70ms of silence. When I switch the node config to split on fixed-length (43), as long as it syncs at the correct spot, it will never lose a byte or split in the wrong spot (tested for months with zero errors). The problem with this method is that the sync point seems to be set while configured with a timeout and when I switch to fixed-length it splits the data perfectly. After I restart NR or the Pi it won't sync at the same spot and I get a 43 byte packet that starts at the wrong spot. If I change the node back to a timeout type, deploy, and then change it to fixed-length it will get back in sync.
On the deploy problem: I was only deploying modified, I didn't try the other methods. This happens regardless of the timing settings. I was using only 1 serial-in and 1 serial-out initially, but then I removed the serial-out and replaced with a serial-request just to give it a shot and see what it did. My deploys take a few seconds and the Pi is not under a high load at any point.
Thank you for sharing your thorough analysis. Jeez have you done your homework. I'm really surprised by your evidence though. We're talking big numbers here, it's not like we're pushing for nanosecond precision. I'm curious to understand wth is happening there, I'd love to have a look in my spare time but that's close to zero right now. Only thing we might want to look into would be for a way to flush the input buffer upon opening so that you can use the fixed length approach without your clever workaround. Which BTW is still working, right? Have you ever tried a full redeploy sticking with fixed length?
line 328 in 25-serial.js already has the flush in there - but commented out... so you could try uncommenting it to try.
Tonight I removed the comment on line 328 and it doesn't make any difference. A full deploy doesn't make any difference, either.
I actually have been running it using the silence option for days now and it keeps working, but I do get those dud packet splits often. The error rate is the same as when I used the old timeout method. I am not a programmer by trade, but I could probably figure out what's going on if I fool around with it long enough. I might give it a shot in the coming weeks. Thanks for taking a look at it.
Thanks for trying that. I'm still surprised that the new timeout mode doesn't help. Just to double check - you are getting 43 bytes (in about 30mS) every 100mS - so there should be a decent gap to detect... say 25-40mS. You may initially get a short packet at start if it starts mid packet - and that can easily be detected (by length) and discarded. But after that it ought to be solid... so yes - I'm confused why this isn't working.