flowable-engine icon indicating copy to clipboard operation
flowable-engine copied to clipboard

Bad behavior using Event Gateway with 2 timers

Open wberges opened this issue 3 years ago • 6 comments

Describe the bug I have designed a loop with an event gateway waiting for either a timer for intermediate loops or another timer for the end (to avoid endless loops). Unfortunately, I have an unexpected behavior stopping the workflow in some cases (and nothing in Dead Letter). In some other cases where the end timer succeeds (sometimes after optimistic exception), its theoric end time is far exceeded.

I have tested with the Event Gateway synchronous, asynchronous, and asynchronous exclusive, still with error.

Expected behavior Each workflow started should reach the end event after the end timer duration.

Code Test_actinst_table.zip

Screenshots: Model of the loop (only the PENDING branch is used): image 3 processes started: image After full execution (no more log in console), we have several end results: [SUCCESS] Process successfully ended: image [BUG] Process ended but with the intermediate loop timer in green...(??): image [BUG] Process NOT ended BUT stuck (ACT_RU_ACTINST history not deleted, but no activity with NULL END_TIME...): image

Additional context 6.7.2 with H2 (default config) in Tomcat 8

Console log:

step1 step2.1 step3.0 - init endLoopDate max end date for the loop: 12/20/2022 17:59:20 step3 12/20/2022 17:58:50 step1 step2.1 step3.0 - init endLoopDate max end date for the loop: 12/20/2022 17:59:23 step3 12/20/2022 17:58:53 step1 step2.1 step3.0 - init endLoopDate max end date for the loop: 12/20/2022 17:59:25 step3 12/20/2022 17:58:55 step3 12/20/2022 17:59:00 step3 12/20/2022 17:59:05 step3 12/20/2022 17:59:05 step3 12/20/2022 17:59:10 step3 12/20/2022 17:59:15 step3 12/20/2022 17:59:15 step3 12/20/2022 17:59:25 2022-12-20 17:59:25.135 ERROR 15356 --- [ask-Executor-13] ltAsyncRunnableExecutionExceptionHandler : Job a55e3bbe-8087-11ed-9077-6a545a21dcb1 failed

org.flowable.common.engine.api.FlowableOptimisticLockingException: JobEntity [id=a55e3bc1-8087-11ed-9077-6a545a21dcb1] was updated by another transaction concurrently at org.flowable.common.engine.impl.db.DbSqlSession.flushDeleteEntities(DbSqlSession.java:643) ~[flowable-engine-common-6.7.2.jar:6.7.2] at org.flowable.common.engine.impl.db.DbSqlSession.flushDeletes(DbSqlSession.java:598) ~[flowable-engine-common-6.7.2.jar:6.7.2] at org.flowable.common.engine.impl.db.DbSqlSession.flush(DbSqlSession.java:365) ~[flowable-engine-common-6.7.2.jar:6.7.2] at org.flowable.common.engine.impl.interceptor.CommandContext.flushSessions(CommandContext.java:212) ~[flowable-engine-common-6.7.2.jar:6.7.2] at org.flowable.common.engine.impl.interceptor.CommandContext.close(CommandContext.java:70) ~[flowable-engine-common-6.7.2.jar:6.7.2] at org.flowable.common.engine.impl.interceptor.CommandContextInterceptor.execute(CommandContextInterceptor.java:114) ~[flowable-engine-common-6.7.2.jar:6.7.2] at org.flowable.common.spring.SpringTransactionInterceptor.lambda$execute$0(SpringTransactionInterceptor.java:57) ~[flowable-spring-common-6.7.2.jar:6.7.2] at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:140) ~[spring-tx-5.3.14.jar:5.3.14] at org.flowable.common.spring.SpringTransactionInterceptor.execute(SpringTransactionInterceptor.java:57) ~[flowable-spring-common-6.7.2.jar:6.7.2] at org.flowable.common.engine.impl.interceptor.LogInterceptor.execute(LogInterceptor.java:30) ~[flowable-engine-common-6.7.2.jar:6.7.2] at org.flowable.common.engine.impl.cfg.CommandExecutorImpl.execute(CommandExecutorImpl.java:56) ~[flowable-engine-common-6.7.2.jar:6.7.2] at org.flowable.common.engine.impl.cfg.CommandExecutorImpl.execute(CommandExecutorImpl.java:51) ~[flowable-engine-common-6.7.2.jar:6.7.2] at org.flowable.job.service.impl.asyncexecutor.ExecuteAsyncRunnable.executeJob(ExecuteAsyncRunnable.java:127) ~[flowable-job-service-6.7.2.jar:6.7.2] at org.flowable.job.service.impl.asyncexecutor.ExecuteAsyncRunnable.run(ExecuteAsyncRunnable.java:115) ~[flowable-job-service-6.7.2.jar:6.7.2] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[na:na] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[na:na] at java.base/java.lang.Thread.run(Thread.java:829) ~[na:na]

step3 12/20/2022 17:59:25 step4.1 step3 step3 12/20/2022 17:59:35 12/20/2022 17:59:35 step3 step3 12/20/2022 17:59:45 12/20/2022 17:59:45 step3 step3 12/20/2022 17:59:55 12/20/2022 17:59:55 step3

step3 12/20/2022 18:01:05 12/20/2022 18:01:05 step3 step3 12/20/2022 18:01:15 12/20/2022 18:01:15 step3 step3 12/20/2022 18:01:25 12/20/2022 18:01:25 step3 step3 12/20/2022 18:01:35 12/20/2022 18:01:35 step3 12/20/2022 18:01:45 step3 12/20/2022 18:01:45 step3 step3 12/20/2022 18:01:55 12/20/2022 18:01:55 step3 step3 12/20/2022 18:02:05 12/20/2022 18:02:05 step3 12/20/2022 18:02:15 step3 12/20/2022 18:02:15 step3 step3 12/20/2022 18:02:25 12/20/2022 18:02:25 step3 step3 12/20/2022 18:02:35 12/20/2022 18:02:35 step3 12/20/2022 18:02:45 step3 12/20/2022 18:02:45 step3 step3 12/20/2022 18:02:55 12/20/2022 18:02:55 step3 step3 12/20/2022 18:03:05 12/20/2022 18:03:05 step3 step3 12/20/2022 18:03:15 12/20/2022 18:03:15 step3 step3 12/20/2022 18:03:25 12/20/2022 18:03:25 step4.1

ACT_RU_ACTINST: issue_eventgtw_actinst.docx

wberges avatar Dec 20 '22 17:12 wberges

Hello, Did someone investigate on this issue? If more info are needed, please ask (even if there are certainly enough details). Thanks

wberges avatar Oct 06 '23 10:10 wberges

Hello, Did someone investigate on this issue? FYI we have encountered the same problem with an event message + a timer. It seems that, when both branches are activated at the same time (we receive a message when the timer is triggered, and vice versa), a FlowableOptimisticLock exception is raised, blocking the deletion of one of both events (e.g. the message event). At the end, 1 branch is able to continue, when the other one stays forever (dead), even if the workflow continues (and ends). Thanks

wberges avatar Oct 23 '23 14:10 wberges

Hi @wberges, just a random observer here ...

Observations:

  • you have a lot of flowable:async="true" at random places, which is quite unusual (and can have consequences):
    • at <script> steps: is it to not block on sleep(5000)?
    • at <eventBasedGateway>: shouldn't be needed, since by def this gateway suspends the execution, also it's not documented what this flag would do on it (didn't check the Flowable code, maybe it ignores the flag here)
  • at the gtw_if_fork x-gateway, seems strange to have a default unconditional branch as the first one, since the branches are supposed to be evaluated in XML-order, so the unconditional one should probably be the last one
  • as an alternative to the <eventBasedGateway>, you can just fork right after the gtw_if_join step via a <parallelGateway> into 2 branches:
    • the time-out branch, containing steps: step3.0 (to compute endLoopDate), the time-out timer (using that endLoopDate), and the final step4.1
    • the looping branch, containing steps: step3 (producing the result), the x-gateway (checking the result), the repeatedTimer (if pending, loops back to step3), and the final alternatives step4 and step4.2.

However, not sure that any of that would explain (or solve) the cross-process instance interference you suspect to observe.

karel-cernohorsky avatar Jun 10 '24 12:06 karel-cernohorsky

Hello, I don't understand how you could replace an event based gateway (exclusive gateway where only the first event received will go in the right branch) by a parallel one (which will execute all the branches). Anyway, the goal was to create a valid and simple test BPMN to show the error I encountered in another context 😉 And the error occurs. Thanks for the investigation in all cases.

wberges avatar Jun 10 '24 18:06 wberges

Hi,

the <parallelGateway> of course requires an additional mechanism, to be able to "kill" the other branch — either by throwing a signal at it (received by a <boundaryEvent> at the other branch), or by exiting via a terminating <endEvent> (i.e. with nested <terminateEventDefinition />), if both branches share the same (sub-)process.

Arguably, the <eventBasedGateway> (if / when it works correctly) seems like a cleaner concept — though in a more complex setup, I'd anyway expect another parallel branch (or branches) to activate one of the <eventBasedGateway>'s outgoing flow. And I do understand that you simplified the flow to showcase the suspected error.

Is the issue still present it the latest Flowable release? I didn't yet get to reproducing locally what you observed, but will definitely try — simplifying the flows via <eventBasedGateway> is appealing.

karel-cernohorsky avatar Jun 11 '24 07:06 karel-cernohorsky

> or by exiting via a terminating endEvent In this case, you will encounter another issue still not solved (#3577) 😋

wberges avatar Jun 11 '24 17:06 wberges