Configuration Options to help restore move faster

Open ryankazokas opened this issue 2 years ago • 0 comments

Awesome tool. We had a use case to pull from our customer addresses table and needed to obfuscate the data to test out why a query being run in our production environment could not be replicated in staging. So to do this, i pulled out all of the addresses and only that via a dump. ~8 million rows took about an hour to dump which is fine. It was about 4GB of data. However, restoring the dump was where i had my issues. I started yesterday around 1pm EST and it it's now 10am EST the following day and it only has about 1.7million records in the replica db

Here is what my config file looks like:

datastore:
  aws:
    bucket: random-replibyte
    region: us-east-1
    credentials:
      access_key_id: xxxxxx
      secret_access_key: xxxxxxxxxxx
destination:
  connection_uri: postgres://postgres:[email protected]:1234/postgres
  wipe_database: false

Are there any other destination options to help speed this up. I took a look through the code base and nothing jumped out at me, but wanted to ask to see if i was missing something around how it batches it's request.

Edit: here is where i was looking: https://github.com/Qovery/Replibyte/blob/1476dd7c248814201b704c1c3b3eaa4f8a6eb60c/replibyte/src/config.rs#L222

Jan 30 '24 14:01 ryankazokas