Retry mechanism
By observing the code, it is found that there is no failure retry mechanism for the scheduling of dataX tasks at present. The process I understand is:
- Create a task and bind the execution node. 2.service runs the task, calling the run method based on the bound execution node, and making random requests to the bound executor by using feign's load balancing strategy.
- The executor pulls up the task by himself, manages the task he is executing, and reports the task execution status to the service through the heartbeat interface and the status reporting interface.
so In one case, if the task fails, only the failure status will be reported. Will not fail to retry at the current node. Service will not retry on other execution nodes. In the other case, if the current execution node goes down, the task status will not be updated (it is stuck in the last state) until the down node recovers.
Due to the lack of document description, the above is the result of my code combing. I don't know if the situation I described is true. I hope the owner can reply and confirm it at your convenience. Thanks!
通过观察代码,发现目前关于dataX任务的调度,并没有失败重试机制。 我理解的流程是: 1.创建任务,绑定执行节点。 2.service运行任务,基于绑定的执行节点,调用run方法,利用feign的负载均衡策略,随机请求到绑定的executor。 3.由executor自行拉起task,并管理自身执行中的task,通过心跳接口和状态上报接口,向service上报任务执行状态。
所以 一种情况:如果任务失败了,只是会上报失败状态。不会在当前节点失败重试。也不会由service在其他的执行节点上进行重试。 另一种情况:如果当前执行节点宕机,那么任务状态将不会更新(表现为卡住在最后一个状态),直至宕机节点恢复。
由于文档描述较少,以上是我通过代码梳理的结果,不知道我描述的情况是否属实,方便时,希望owner能回复并给予确认。 感谢!
@Davidhua1996
Retry mechanism of datax task is not supported. The latest version is exchangis1.1.1. You can pay attention to it.