Data loss if a lot of data is inserted simultaneously
Hello, the sender is not threadsafe it seems. When attempting to send a lot of data to the table, large swaths of data is randomly lost. I suspect the reason is that the sender position is reset in the compact function without taking care of the order of execution of flush. So when flush calls are scheduled to event loop and data from flush 1 finishes before flush 2 starts, the buffer will be reset and the data pending for flush 2 will be lost.
With autoflushing this issue is basically unavoidable because the pending row counter is incremented in the "at" function and that is the same function that checks and schedules the flush, while the reset of this pending rows counter happens in compact function, which only runs after the data has been submitted. So if you are sending a lot of data in a large burst via many calls to "at", it's unavoidable to have multiple flush calls scheduled on the loop.
IMO it would be most sensible to ensure the order of execution of flush calls on the library itself. For example by adding the flush calls to an in-mem promise que instead of just "awaiting" on them like now, which will not ensure the order of execution of compact calls.
same issue here, in nodejs lots of ERR_SOCKET_CLOSED_BEFORE_CONNECTION also happen
If you can give it a try to the version I've refactored would be nice. We had similar problems and we've been using this version for a while and is quite more stable.
Is quite easy to test it:
https://github.com/questdb/nodejs-questdb-client/pull/42, clone the pull request in your machine,pnpm run build, and pnpm run link, this will generate a 4.0.0 version in your machine you can install in your project and test it.
I've added a test to replicate the issue you describe and the test passes in my branch.
I think the data loss is the result of something else.
During the flush() call (auto or manual, does not really matter), a new buffer is created, all rows which are terminated and ready to be sent are copied into this new buffer, then the main buffer is compacted, and state (including the pending rows counter) is reset, then the data is sent asynchronously. The task is working from the copy buffer, which is not changing. Any calls after this are writing into the main buffer which has been compacted. The next flush() call will create a copy of the main buffer again, and so on...
The above is the default behaviour.
You can pass the copy_buffer=off config option to the Sender to switch buffer copying off.
If you do that, all calls to the Sender touching the buffer have to be synchronized/sequential.
You could probably achieve this with the kind of promise chaining you can see in @semoal's PR.
This option is there to help those, who send huge chunks of data, and creating copies of the buffer could result in memory pressure and lots of garbage.
However, I do not know anyone used this option ever, we could probably remove it.
@semoal PR already ignores this option.
Bottom line is, if you use the Sender with default settings (copy_buffer=on), all async sending tasks should have a copy of the data they are working on.
I have also run the test ingests all data without loss under high load with auto-flush from @semoal's PR with current main (version 3.0.0), and it runs without issues.
I also wonder if you tried out @semoal's PR. Maybe the problem is that the core http module cannot handle the load, in that case switching to undici could help.