[binance-stream] Optimised subscription for orderbooks / deltas
Previous solution:
- Fetched book snapshots on Web Socket thread, causing delays of other events (serious delays in case of multiple books + correctly implemented and used API rate limiter),
- Fetched book snapshot firstly, then counted that it will be possible to patch the snapshot with deltas received later via Web Socket, which could lead to a necessity of "re-syncing" (downloading a new snapshot), which definitely didn't help to match exchange quota of API usage.
Solution proposed in this PR:
- Fetches book snapshots on a dedicated thread, mitigating any delays on other, not related Web Socket events.
- Firstly subscribes for book deltas, starts buffering them, then downloads the book snapshot and patches from applicable buffered deltas. This approach highly increases a probability, we will have all the deltas required to patch the downloaded snapshot. In practice I didn't see any re-syncing of snapshots unless the Web Socket reconnected, which invalidates the snapshots.
Disclaimer: It is rather complicated PR. I provide the solution as it is;p don't quarantee it's bugfree for all the imagineable corner cases. It just works for me and in my use cases it makes a difference as day and night, books synchronise much faster, without this change I couldn't successfully synchronise 200+ books, it just taked ages. I would be thankful if sombody experienced in the library would review the changes and maybe, used his trusty benchmarks, testing if the change doesn't break the thing ;)
Assuming from the activity on bitrich-info/xchange-stream, I think @badgerwithagun is a good person to review these changes ;)
@earce please take a look, I think I have addressed Your comments.
What is important: I suggest to keep this PR open for a few next days, if something bad would happen after the changes, and I would detect that, I will push a fix here.
One more comment, also, I honestly am very allergic to synchronization blocks, it's significantly more expensive, atomics use CAS which is more efficient, also synchronization if you ever introduce another lock or even a non-re-entrant lock incorrectly can create deadlocks, just my 2 cents....
@earce please keep in mind the point where we have started, without my change, books are waiting for initial snapshots blocking the thread used to deliver web socket events, delaying everything, also each particular pair can download snapshots multiple times, trying to blindly catch the situation where it will be possible to patch it from upcomming deltas. And now we are discussing the drawbacks of synchronisation blocks, which are marginal in comparison;) I'm have seen on my eyes, dramatic change in performance after the changes proposed in this PR have been applied to my project.
One more comment, also, I honestly am very allergic to synchronization blocks, it's significantly more expensive, atomics use CAS which is more efficient, also synchronization if you ever introduce another lock or even a non-re-entrant lock incorrectly can create deadlocks, just my 2 cents....
@earce please keep in mind the point where we have started, without my change, books are waiting for initial snapshots blocking the thread used to deliver web socket events, delaying everything, also each particular pair can download snapshots multiple times, trying to blindly catch the situation where it will be possible to patch it from upcomming deltas. And now we are discussing the drawbacks of synchronisation blocks, which are marginal in comparison;) I'm have seen on my eyes, dramatic change in performance after the changes proposed in this PR have been applied to my project.
That is fine and the improvement is much appreciated but if we are going to make a change this drastic and invasive, makes sense to do it the best way we can and set good precedents.
In general this library uses observables and we aim to avoid the usage of synchronization blocks as much as possible, these act closer to actors and this starts straying away to be antipattern from the rest of the lib so I am trying to reduce how much we do this.
I hope you are finding this review helpful I just want to make sure we do this the most efficient/correct way possible.
One more comment, also, I honestly am very allergic to synchronization blocks, it's significantly more expensive, atomics use CAS which is more efficient, also synchronization if you ever introduce another lock or even a non-re-entrant lock incorrectly can create deadlocks, just my 2 cents....
@earce please keep in mind the point where we have started, without my change, books are waiting for initial snapshots blocking the thread used to deliver web socket events, delaying everything, also each particular pair can download snapshots multiple times, trying to blindly catch the situation where it will be possible to patch it from upcomming deltas. And now we are discussing the drawbacks of synchronisation blocks, which are marginal in comparison;) I'm have seen on my eyes, dramatic change in performance after the changes proposed in this PR have been applied to my project.
That is fine and the improvement is much appreciated but if we are going to make a change this drastic and invasive, makes sense to do it the best way we can and set good precedents.
In general this library uses observables and we aim to avoid the usage of synchronization blocks as much as possible, these act closer to actors and this starts straying away to be antipattern from the rest of the lib so I am trying to reduce how much we do this.
Understood ;) I spent a considerable amount of time, trying to solve everything reactiveX way. However failed to use rx in order to:
- buffer book update events
- at the same time try to download matching book snapshot
- then patch the initial snapshot with deltas
- then allow further events to be passed to observers of order books
And finally ended with these sync blocks... due to the main reason, that we need to sync book initialisation and all these patches, and then allow the book observable to access the book. Maybe if we would move this initial patching to the thread, where the book is observed... we could resign from most of the synchronisation, that would be possible, some unnecessary computations would be moved to the observable thread, but still, its a matter of milliseconds if not nanoseconds to apply all the buffered patches.
I do think if you left the code mostly as it was and just try and use the atomics mostly you would be fine. You will need synchronization when you want to modify 3 different objects atomically (or alternatively you could create a wrapper object and update it atomically and totally avoid synchronization).
Eg. a wrapper object with
private OrderBook book;
private long bookLastUpdateId;
private final Queue<DepthBinanceWebSocketTransaction> deltasBuffer = new LinkedList<>();
inside of an atomic called by updateAndGet(), or just how you previously had the code and a few synchronization blocks but just less of them, there's a few ways to do this, I just know this current iteration can be made better.
@earce see the updated impl, I have removed all the synchronisation blocks, now everything is guarded by volatile boolean bookInitialized attribute.
Remarks:
- I had to restore the concurrent queue of buffered deltas, this queue is accessed from two threads
- Other, past atomics, are volatiles now, I can keep them like that or change back to atomics, what do you think?
Personally I'm a bit allergic to volatiles :D Regardless the outcome, its good to keep this PR open for a few days, I need to retest the change.
I don't think the current iteration is threadsafe, theres a few points here that break the threadsafety. This doesn't exactly do what I've mentioned.
Do you not prefer to wrap all these in a single atomic class. Otherwise your synchronized implementation was better. There was only a few places I had suggested removing the synchronized.
I think you original implementation was better there were just a few places we could improve it, removing sync blocks everywhere like it currently is done isn't threadsafe.
I don't think the current iteration is threadsafe, theres a few points here that break the threadsafety. This doesn't exactly do what I've mentioned.
Do you not prefer to wrap all these in a single atomic class. Otherwise your synchronized implementation was better. There was only a few places I had suggested removing the synchronized.
I think you original implementation was better there were just a few places we could improve it, removing sync blocks everywhere like it currently is done isn't threadsafe.
We are in a dedicated class from the start: OrderBookSubscription. With main purpose to sync the asynchronous initialisation of the order book, hmm?
After I have removed all the sync blocks, it should be still thread safe. All volatiles are private in a dedicated OrderBookSubscription class, which protects external access to its state. Thread safety is now based on:
- concurent queue of buffered book deltas
-
volatile boolean bookInitializedwhich protects access to other volatile attributes, see that when we open the gate, thebookInitialisedis set to true as the latest one, so partially filled state won't be used by other thread which correctly verifies state:if (bookInitialised). Also, when we close the gate,bookInitialisedis set to false as the first of attibures which will be cleaned up.
The key difference is, with sync blocks it was harder to break everything with minor changes in the OrderBookSubscription ;) Of course there is also a possibility I missed something.
So, finally, I understand we can restore one of my intermediate implelemtations, where:
- I have removed atomics
- But still keept sync blocks
Right? I could update PR this way. I would like such solution by the way, I feel more comfortable with synchronized blocks.
Lets go back to an earlier implementation and I can review thread-safety, I think we can just cut down on the sync blocks, but unless you use an atomic object we can't remove them all.
And it's a little more specific than that related to OrderBookSubscription, it would make more sense to break up the classes where we call one OrderbookWrapper which doesn't have any methods, it just holds the state. And another one which manages all the logic.
Finally we wrap OrderbookWrapper in an atomic reference that way all updates to any of the member variables would happen with an updateAndGet(), this way operations on it would be atomic and thus thread-safe. This is another option, and CAS is a more efficient algorithm than synchronization blocks. In general synchronization blocks are a heavy handed approach, they aren't lock free and thus a less efficient way to do things. Also synchronization blocks take a ReadWriteLock which is the heaviest type. Another lockfree approach is using a ringbuffer but that is far far more involved.
I know this is beyond the scope of your review but might be nice to use something more elegant.
Has there been any progress on this? Id skip elegancy for any bugfix who increases resiliency :)
@gewure According the latest suggestion from @earce, I have reverted back to one of previous solutions (months ago), I use the code in state from this PR since then, without any issues. There were also no known issues with the intermediate proposals by the way ;) I have tested quite heavily, books quantities which were just impossible to synchronise without this change, work reliably, just due to the nature of the Binance API, it takes a while to collect (respecting the API limits).
A hint: the PR is conflicted currently on an order of imports, somebody with the configured project, could fix that easily.