jitsu
jitsu copied to clipboard
Improve and clarify error retry logic
Problem
- Currently Jitsu retries error infinitely - that doesn't make much sense because many kind of errors cannot be solved with retries.
- For streaming storages error leads to growth of redis queue
- Retry and fallback logic is not clear and not documented
Solution
- Introduce
server.error_retry_period_hoursconfiguration parameter that will work as default for all destinations (streaming and batch). Default value: 24 hours - Introduce DestinationConfig
error_retry_period_hoursparameter that will override default value on destination level.
uploader.go unify fallback logic:
- all errors (parsingErrors, failedEvents, resultPerTable.result.Err) must go to Fallback only after
error_retry_period_hourspasses. (seems that currently for parsingErrors and failedEvents we flood fallback logs with copies of the same events on each uploader run) - after
error_retry_period_hourspasses jitsu needs to archive incoming file and cleanup status
streaming.go:
- don't use
IsConnectionErrorcheck – retry all errors - instead of 20 sec hardcode introduce
server.streaming_retry_delay_minutesparameter. Default: 1 - after
server.error_retry_period_hourspasses - stop retries and Fallback error events. - Current fallback logic is hidden in abstract.go
AccountResultand must be removed from there.
documentation:
- write Error Handling and Retries documentation page that describes that logic and configuration parameters
No changes in uploader.go yet
Current fallback logic is hidden in abstract.go AccountResult and must be removed from there.
Not addressed yet.