PGSync Run From start, instead of where it left off.

Open himanshusatani opened this issue 4 years ago • 1 comments

PGSync version: 2.1.9

Postgres version: 12.7

Elasticsearch version: Using Opensearch Version 1.0 hosted on AWS

Redis version: 5.0.7

Python version: 3.8.10

Problem Description: I am running Pgsync in Daemon mode (pgsync --config /optional/path/to/schema.json --daemon) using ELASTICSEARCH_STREAMING_BULK=true. The issue is I am facing is that If I restart the PGsycn process due to some sort of failure then the It will start from scratch, it will re insert all the records. Expected behavior, It should start from where it left off, Like If I am trying to insert 300 million record and due to somehow it stop at 48M record, then If I start the process again it should Start from 48M record not from 0.

Below are the env config I am using

export PG_HOST= export PG_PORT=5432 export PG_USER= export PG_PASSWORD= export ELASTICSEARCH_SCHEME=https export ELASTICSEARCH_HOST= export ELASTICSEARCH_PORT=443 export ELASTICSEARCH_USER= export ELASTICSEARCH_PASSWORD= export ELASTICSEARCH_TIMEOUT=100 export ELASTICSEARCH_CHUNK_SIZE=3000 export ELASTICSEARCH_VERIFY_CERTS=false export ELASTICSEARCH_USE_SSL=true export ELASTICSEARCH_SSL_SHOW_WARN=false export ELASTICSEARCH_STREAMING_BULK=true export ELASTICSEARCH_MAX_RETRIES=10 export ELASTICSEARCH_STREAMING_BULK=True export ELASTICSEARCH_MAX_CHUNK_BYTES=80000 export ELASTICSEARCH_MAX_RETRIES=100000 export ELASTICSEARCH_INITIAL_BACKOFF=10 export ELASTICSEARCH_RAISE_ON_EXCEPTION=false export ELASTICSEARCH_RAISE_ON_ERROR=false export REPLICATION_SLOT_CLEANUP_INTERVAL=60 export POLL_TIMEOUT=0.1

Jan 05 '22 07:01 himanshusatani

Can you describe the sort of failure you are experiencing?
The initial sync is all or nothing in nature.
We don't update the checkpoint until the initial sync is complete.
It would be easier to find out the reason for the failure you are experiencing and try to address that.

Jan 06 '22 15:01 toluaina