gitbitex-new icon indicating copy to clipboard operation
gitbitex-new copied to clipboard

High Availability and Data Reload on Program Restart

Open hayletdomybest opened this issue 1 year ago • 3 comments

Hi,

I have two questions regarding the system:

  1. Data Reload on Program Restart: When the program is closed and then reopened, where does it reload the orderbook, account, and other relevant data from the last session? Is there a specific mechanism in place to persist and restore this data on startup?

  2. High Availability Support: Does the system support high availability? If so, could you provide details on how this is implemented and what strategies are recommended to ensure redundancy and minimal downtime?

Thanks in advance for your help!

hayletdomybest avatar Sep 10 '24 00:09 hayletdomybest

1,EngineSnapshotThread will periodically create snapshots for the matching engine and save them in MongoDB. When the matching engine starts, it will read the snapshot from MongoDB and restore it.

2,The matching engine supports deploying multiple instances simultaneously, but only one instance will be active while the others will wait. Once the active instance exits, another instance will immediately start working.

greensheng avatar Sep 10 '24 02:09 greensheng

I have reviewed the materials and have the following questions:

  1. I noticed that each time the matching consumer is executed, it starts by sending a CommandStartMessage with an incremented Sequence number. During the process, entities that appear to be updated also use this Sequence number as a base for incremental updates. Finally, a CommandEndMessage is sent. What is the main purpose of this Sequence, and why is it implemented this way?

  2. When multiple matching engines are started, only one leader node consumes messages due to Kafka's characteristics. Each node initializes by dumping data from MongoDB into memory. I'm wondering: since other slave nodes don't participate in the consumption process, will their data become out of sync? If the leader node fails, will the slave nodes have missing data? Or is there a mechanism in place to ensure synchronization across all nodes?

hayletdomybest avatar Sep 11 '24 01:09 hayletdomybest

  1. CommandStartMessage and CommandEndMessage respectively represent the start and end of a transaction, ensuring data consistency when processing messages downstream.
  2. As long as the slave node obtains a complete snapshot, it can start working normally. Any snapshot at any position supports replay, but there may be some duplicate data, which will be deduplicated downstream based on the sequence of the message.

greensheng avatar Sep 11 '24 09:09 greensheng