Node Prevent too many payment retries

(Planned in the second phase)

This card aims to address one of the very last vulnerabilities in our automation. Although we devs possess a confidence that the retry mechanism would always bring eventual successes after the train of failed attempts, over a series of interference, there is still a hovering threat of getting lost in the cycles if the underlying issue is exogenous and persist. This kind of reason might be sudden issues at the blockchain service provider.

All in all, even if this is a false concern, it is a good practice looking after an unwinding process like this gracefully enough, with a proper definite behavior.

The issue with this seemingly plain demand is that the mechanism cannot be halted and the last events in it left forgotten, resolved on its own. That's not how it works. Once submitted tx can only be resubmitted but never taken back. If you abandon it you must be aware it can potentially become a successful tx over some time.

We need to remember that some of the failures we declare as failures cannot be understood as something called the "hard" failures, but rather "soft". It's not like we are noted about the failure, we only suppose it is one, because of rules we set ourselves.

If we want to escape from the iterations we need to keep track of these supposed failures and write them into the db under special identification. After we have them documented, the intent could be to terminate the retry process and schedule the regular new-payables scan.

We need an extra group of failures that needs to be monitored on even after once already processed. The second time, it is because we make sure that the payments we had abandoned didn't succeed sometime around now. If they did, though, it probably shattered the latest batch then.

In any case, we have already mechanism how to take this in our records and how the certain account can be marked as needing a two-phase processing. It should be possible to add more adepts under this category.

Now the impression can be, that what I just described could imitate the standard pending-too-long process. Because if we halt the retries, then maybe wait some longer time (not necessarily), and attempt new payments, it's very likely we will pick all these accounts that occurred in the ever-spinning retries we must have given up on. Now, where's the difference? It's still a question that needs an answer. But some brittle nuances can be found.

Jul 19 '25 13:07 bertllll

Consider also how we should constrain the ValidationFailure(s) if racking up endlessly. It's a similar problem. There must be a hard limit.

Note: ValidationFailure is when an RPC call for tx receipt fails.

Aug 14 '25 09:08 bertllll

The critical question is: can there be a situation when we have stall payments A,B,C. And also a potential payment D, which would qualify if the new-payable scanner was run. Could repeating the payment C, without any progress, be hopeless, while if we tried the payment D instead, could we succeed the payment?

If never, then this card is completely pointless. If somehow possible, then it might provide some value.

Sep 19 '25 09:09 bertllll