temporal icon indicating copy to clipboard operation
temporal copied to clipboard

Signal to self error in Workflow code

Open pauldemarco opened this issue 5 years ago • 7 comments

Expected Behavior

A workflow should be able to signal any workflow using a workflow ID, including itself.

Actual Behavior

Throws a UnknownExternalWorkflowExecution when attempting to signal itself.

Steps to Reproduce the Problem

  1. Try the following workflow code:
func TestWorkflow(ctx workflow.Context) error {
	we := workflow.GetInfo(ctx).WorkflowExecution
	err := workflow.SignalExternalWorkflow(ctx, we.ID, we.RunID, "messages", "hello").Get(ctx, nil)
	if err != nil {
		panic(fmt.Errorf("could not signal external workflow: %v", err))
	}
	return nil
}
  1. Error shows in temporal web history: image

  2. The worker logs the error:

2020/08/18 21:29:57 ERROR Workflow panic. Namespace default TaskQueue test WorkerID 465773@beast-ubuntu@ WorkflowID TestWorkflow RunID 254db1e2-ebcc-489a-9835-1f5e2e3fb8f8 PanicError could not signal external workflow: UnknownExternalWorkflowExecution PanicStack coroutine root [panic]:
myproject.com/domain/root.TestWorkflow(0x1003500, 0xc000498a00, 0x0, 0x0)
        /home/paul/myproject.com/domain/root/testworkflow.go:41 +0x17a
reflect.Value.call(0xd69c40, 0xefbd48, 0x13, 0xeaf2fa, 0x4, 0xc000486de0, 0x1, 0x1, 0xfd2270, 0xe8e420, ...)
        /usr/local/go/src/reflect/value.go:460 +0x8ab
reflect.Value.Call(0xd69c40, 0xefbd48, 0x13, 0xc000486de0, 0x1, 0x1, 0xfd2270, 0xe8e420, 0xc0004b4640)
        /usr/local/go/src/reflect/value.go:321 +0xb4
go.temporal.io/sdk/internal.(*workflowEnvironmentInterceptor).ExecuteWorkflow(0xc0004b4640, 0x1003500, 0xc000498a00, 0xc0000362c0, 0xc, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /home/paul/go-sdk/internal/workflow.go:370 +0x2b2
go.temporal.io/sdk/internal.(*workflowExecutor).Execute(0xc000498940, 0x1003500, 0xc000498a00, 0x0, 0xc0004f3738, 0xc68596, 0x0)
        /home/paul/go-sdk/internal/internal_worker.go:759 +0x334
go.temporal.io/sdk/internal.(*syncWorkflowDefinition).Execute.func1(0x10036c0, 0xc00009b0e0)
        /home/paul/go-sdk/internal/internal_workflow.go:491 +0xf3
^C2020/08/18 21:30:00 INFO  Stopped Worker Namespace default TaskQueue test WorkerID 465773@beast-ubuntu@

Specifications

  • Version: 0.28.0
  • Platform: Helm-chart

pauldemarco avatar Aug 19 '20 01:08 pauldemarco

This is by design. Server has special code to handle this case: https://github.com/temporalio/temporal/blob/0ae59f62c4d0282aa5b7821706540a7f049473a4/service/history/transferQueueActiveTaskExecutor.go#L456. You may also check: https://github.com/uber/cadence/pull/539 and https://github.com/uber/cadence/issues/531.

alexshtin avatar Aug 19 '20 17:08 alexshtin

You'll have to forgive me as I'm not very familiar with how signals are handled under the hood. IMHO, from an end-user perspective, it would be nice if we could SignalWorkflow + SignalWorkflowWithStart from within workflow code and not have to worry about making sure the target workflow is external. Is there a reason we have a special SignalExternalWorkflow instead of just SignalWorkflow(withStart bool) for workflow context?

Note: I'm aware of the performance implications when sending a signal to self rather than just directly placing in some internal queue. That's a decision I'd like to make for myself as an end-user. If I'm enforcing a signal-driven architecture a top temporal, than I'd like all tasks to be driven by actual Signals that are forced through temporal. Circumventing this feels like a hidden side effect and will not show in monitoring / middleware that I might utilize in the future.

pauldemarco avatar Aug 19 '20 18:08 pauldemarco

Temporal server does not have support for signal command which tries to send a signal to itself. Implementation of signal command processing is written in a way where it holds a lock on the source execution while it is writing a signal to target execution. If source and target are the same then it results in a deadlock and transfer task processing for that signal command will never succeed. This is the reason server has a protection in place to not allow sending signal to itself. I agree that it is a side effect of particular implementation choice made for processing of signal command. But I don't think this is a bug. I will convert this into a feature request which we could potentially consider to support in the future.

I still think that a workflow sending a signal to itself does not make much sense as it could simplify send a message through a channel. I don't see why it requires a full round trip back from server just to deliver a message on signal channel.

samarabbas avatar Aug 20 '20 06:08 samarabbas

Probably we should improve error message there though. Report something like WorkflowCantSignalItself instead of UnknownExternalWorkflowExecution we currently have.

alexshtin avatar Aug 20 '20 17:08 alexshtin

@samarabbas that makes sense. I think the only reason would be to have clear metrics. For instance, Signals per minute in Grafana would not be a true representation of a signal driven architecture if a significant amount of self signals are not being sensed.

I have things working for now using a local workflow channel, and piping into that from Signal channel and local self signals.

I would appreciate having this as an enhancement, but if it’s a serious anti pattern to the way things are working under the hood I definitely would not make sacrifices for it.

Thanks.

pauldemarco avatar Aug 20 '20 18:08 pauldemarco

Understood. I have converted this to a future enhancement which we can re-evaluate later. For now looks like you have a reasonable workaround which is working.

samarabbas avatar Aug 20 '20 22:08 samarabbas

Hi @samarabbas, any movement on this? I've just encountered the same problem, and would love to help work towards a solution here

mattpercy-anz avatar Dec 19 '22 10:12 mattpercy-anz