Incomplete Server Output on Many Parallel Steps
Describe the bug When many steps are executed in parallel, sometimes the step completion message is omitted from the server output.
Censored example:
exec_cn0... [hn_top.post_message_to_slack[0]] Starting http: POST https://hooks.slack.com/services/<censored>
exec_cn0... [hn_top.post_message_to_slack[1]] Starting http: POST https://hooks.slack.com/services/<censored>
exec_cn0... [hn_top.post_message_to_slack[2]] Starting http: POST https://hooks.slack.com/services/<censored>
exec_cn0... [hn_top.post_message_to_slack[3]] Starting http: POST https://hooks.slack.com/services/<censored>
exec_cn0... [hn_top.post_message_to_slack[4]] Starting http: POST https://hooks.slack.com/services/<censored>
exec_cn0... [hn_top.post_message_to_slack[5]] Starting http: POST https://hooks.slack.com/services/<censored>
exec_cn0... [hn_top.post_message_to_slack[2]] Complete: 200 270ms
exec_cn0... [hn_top.post_message_to_slack[4]] Complete: 200 269ms
exec_cn0... [hn_top.post_message_to_slack[5]] Complete: 200 273ms
exec_cn0... [hn_top.post_message_to_slack[1]] Complete: 200 284ms
exec_cn0... [hn_top] Complete 319ms exec_cn0...
Flowpipe version (flowpipe -v)
v0.2.2 (& main)
To reproduce Run many steps in parallel
Expected behavior All steps should display the complete or error message from finish handler.
Additional context
Issue stems from es/handler/step_finished.go - we have some code which returns nil if the pipeline execution is in specific states:
// If the pipeline has been canceled or paused, then no planning is required as no
// more work should be done.
if pex.IsCanceled() || pex.IsPaused() || pex.IsFinishing() || pex.IsFinished() {
return nil
}
Seems to be that the pipeline execution is at finishing before all the step finished events are handled and thus we don't get to the code which renders the server output.
This is caused by the step being complete, thus the planner sets pipeline to finishing before the step finish handler is called.
Therefore server output needs to be rendered after processing of the step but prior to step finished handler; this is difficult to place due to retries, etc.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.
This issue was closed because it has been stalled for 90 days with no activity.