Improve and clarify error retry logic

Open absorbb opened this issue 3 years ago • 1 comments

Currently Jitsu retries error infinitely - that doesn't make much sense because many kind of errors cannot be solved with retries.
For streaming storages error leads to growth of redis queue
Retry and fallback logic is not clear and not documented

Introduce server.error_retry_period_hours configuration parameter that will work as default for all destinations (streaming and batch). Default value: 24 hours
Introduce DestinationConfig error_retry_period_hours parameter that will override default value on destination level.

uploader.go unify fallback logic:

all errors (parsingErrors, failedEvents, resultPerTable.result.Err) must go to Fallback only after error_retry_period_hours passes. (seems that currently for parsingErrors and failedEvents we flood fallback logs with copies of the same events on each uploader run)
after error_retry_period_hours passes jitsu needs to archive incoming file and cleanup status

streaming.go:

don't use IsConnectionError check – retry all errors
instead of 20 sec hardcode introduce server.streaming_retry_delay_minutes parameter. Default: 1
after server.error_retry_period_hours passes - stop retries and Fallback error events.
Current fallback logic is hidden in abstract.go AccountResult and must be removed from there.

documentation:

write Error Handling and Retries documentation page that describes that logic and configuration parameters

Jul 27 '22 08:07 absorbb

No changes in uploader.go yet

Current fallback logic is hidden in abstract.go AccountResult and must be removed from there.

Not addressed yet.

Sep 05 '22 07:09 absorbb