Newly registered namespace not available when polling workflow/activity tasks
Expected Behavior
Before start check if a specific namespace is available, when not available create the namespace and register activities and workflows.
When registering the workflow/activies all is well and Pollers are started correctly without errors.
Actual Behavior
After registration of the namespace all of the roller report the following stacktrace:
2021-02-04 10:14:12.422 ERROR 76618 --- [kflow Poller: 5] io.temporal.internal.worker.Poller : Failure in thread Host Local Workflow Poller: 5
io.grpc.StatusRuntimeException: NOT_FOUND: namespace: [MY_NAMESPACE] not found
at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:262) ~[grpc-stub-1.34.1.jar:1.34.1]
at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:243) ~[grpc-stub-1.34.1.jar:1.34.1]
at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:156) ~[grpc-stub-1.34.1.jar:1.34.1]
at io.temporal.api.workflowservice.v1.WorkflowServiceGrpc$WorkflowServiceBlockingStub.pollActivityTaskQueue(WorkflowServiceGrpc.java:2696) ~[temporal-serviceclient-1.0.4.jar:na]
at io.temporal.internal.worker.ActivityPollTask.poll(ActivityPollTask.java:105) ~[temporal-sdk-1.0.4.jar:na]
at io.temporal.internal.worker.ActivityPollTask.poll(ActivityPollTask.java:39) ~[temporal-sdk-1.0.4.jar:na]
at io.temporal.internal.worker.Poller$PollExecutionTask.run(Poller.java:265) ~[temporal-sdk-1.0.4.jar:na]
at io.temporal.internal.worker.Poller$PollLoopTask.run(Poller.java:241) ~[temporal-sdk-1.0.4.jar:na]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[na:na]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[na:na]
at java.base/java.lang.Thread.run(Thread.java:834) ~[na:an]
```
## Steps to Reproduce the Problem
1. Create namespace:
val registerNamespaceRequest = RegisterNamespaceRequest.newBuilder() .setNamespace(namespace) .setWorkflowExecutionRetentionPeriod(Durations.fromDays(retentionPeriod)) .build()
service.blockingStub().registerNamespace(registerNamespaceRequest)
2. Start/register workflow
val workflowClientOptions = WorkflowClientOptions.newBuilder().setNamespace(namespace).build()
client = WorkflowClient.newInstance(service, workflowClientOptions)
val factory = WorkerFactory.newInstance(client) val workerForCommonTaskQueue = factory.newWorker(TASK_QUEUE) workerForCommonTaskQueue.registerWorkflowImplementationTypes([MyWorkflowImpl]::class.java) workerForCommonTaskQueue.registerActivitiesImplementations([MyActivities])
Closing connection and inserting delays between registration of namespace and registration of workflow seems to help but is not consistent.
Calls to
reportedNamespace?.isInitialized reportedNamespace?.namespaceInfo?.state
Report that it is initialized and Registerd
## Specifications
- Version: SDK 1.0.5
- Platform: Temporal -> 1.4.0, 1.6.1
- Env: kotlin 1.4, Jvm 11
As we discussed with @mmcshane, createNamespace, listNamespaces and other namespace-focused APIs read and write directly to the database, while other APIs (like activity/workflow worker long polls) use a namespace cache. This cache is not a read-thought or write-through cache and takes about 10s to be updated.
As the result, there is not only a usability issue with users not being able to use namespace right after creation, but it looks like right now we lack a simple API to check if the namespace was actually propagated to the cache because namespace-centric APIs doesn't use the cache.
I stumbled upon the same issue just now that I am upgrading from 1.13.1 to 1.16.1 and I was going to open an issue but then I found this one with the same underlying cause I think.
It is easy to reproduce with a simple register-namespace followed by a describe-namespace operation using tctl (although it first happened to me using the java sdk). As per a previous comment it should work because supposedly these are not cached operations, but reality says otherwise.
How to reproduce:
- Checkout the Temporal docker compose project. (Temporal 1.16.1 at the writing date of this comment)
- Start up any variant, for example
docker-compose -f docker-compose-postgres.yml up - Once is up and ready, create and describe a namespace using for example this helper script
#!/bin/bash
# file: create_namespace.sh
VALUE_NS="${1:-testns}"
tctl() {
docker exec temporal-admin-tools tctl "$@"
}
tctl --namespace "$VALUE_NS" namespace register
until tctl --namespace "$VALUE_NS" namespace describe ; do
echo " *** Retrying... *** "
sleep .5
done
Execute ./create_namespace.sh myns
You will see an output like:
Namespace myns successfully registered.
Error: Operation DescribeNamespace failed.
Error Details: rpc error: code = NotFound desc = Namespace name "myns" not found
('export TEMPORAL_CLI_SHOW_STACKS=1' to see stack traces)
*** Retrying... ***
It will eventually succeed within 2-10 seconds.
In 1.13.x this issue does not show up, ever. All versions from 1.14.x up to the current 1.16.1 are affected. Same results using either tctl or the Java sdk at least.
You can edit the .env file to try with different Temporal versions easily.
As a workaround you can implement a polling logic or an artificial delay before you can start to use your new namespace, although both approaches equally undesirable.
This is a known issue, and it is not as easy to fix due to complication in cross namespace replication failover notification requirement. Good news is that, we are working on fixing this. We are currently working on restructure the background task processing logic, which will then remove the requirement of reliable notification for the namespace failover. After that, we can make the namespace cache read through and the problem will be gone. ETA for this is about 3 months, mainly because the rework on the task processing logic is a big under take.
For now, you have to add a 10s delay after you register your new namespace. We are sorry about the inconvenient, but we are actively working on it.
This is fixed by https://github.com/temporalio/temporal/pull/3908