workflow-step-api-plugin icon indicating copy to clipboard operation
workflow-step-api-plugin copied to clipboard

Add `SynchronousNonBlockingStepExecution.ExecutorServiceAugmentor` extension point

Open jgreffe opened this issue 9 months ago • 11 comments

Allow the executorService to be augmented. This avoids having to use reflection by dependent plugins.

See https://github.com/jenkinsci/opentelemetry-plugin/pull/1139 for the consumer.

Updated with comment from @dwnusbaum

opentelemetry wants to make it possible for other plugins to augment the traces produced for Jenkins builds. See jenkinsci/opentelemetry-plugin#896 and jenkinsci/opentelemetry-plugin#909 for more context. See jenkinsci/junit-sql-storage-plugin#413 for an example of a plugin using the OTel context to emit custom spans as part of the junit step.

In order for this to work, plugins need access to an OTel Context in their build steps and/or related code so that they can produce a Span with the correct parent. By default, OTel contexts are stored as a ThreadLocal variable. opentelemetry's synchronous GraphListener implementation sets up this context on the CPS VM thread when a new step starts. For Pipeline steps with fully synchronous StepExecutions, that is enough for context propagation to work correctly within the extent of StepExecution.start. However, for steps whose execution runs on another thread, even if they complete synchronously from the perspective of the Pipeline (e.g. SynchronousNonBlockingStepExecution, GeneralNonBlockingStepExecution), the context will be unavailable, and so any created OTel span will not have the correct parent.

opentelemetry plugin currently handles this case by reflectively modifying SynchronousNonBlockingStepExecution.executorService to wrap it with io.opentelemetry.context.Context.taskWrapping, which automatically propagates the thread local OTel context to tasks run on the executor service. This works, but this kind of reflection is brittle, and we would like to support this use case in a more maintainable way. This PR currently just exposes a @Restricted(Beta.class) API to allow opentelemetry to augment SynchronousNonBlockingStepExecution.executorService with Context.taskWrapping without using reflection.

There may be other ways to accomplish the same use case, but we have not explored them at this time: For example:

  • Perhaps we should instead encourage plugins with build steps that want to support tracing to always look up their parent span using these methods and then set up the context manually as needed? For junit-sql-storage this would mean that the junit step would need to be updated to look up the span and then set up the thread local context for methods in the custom storage provider.
  • Similarly, could we somehow inject the OTel Context into the StepContext for each step automatically, and then have some kind of method like OtelPlugin.useContext(StepContext) that sets up the thread local for plugins that need it? This would only work for Pipeline Steps, but maybe that's ok?
  • There is a ContextStorageProvider API. Perhaps we could implement this in a way that works better for our use case? EDIT: Or perhaps we could invert the dependency as described in: https://github.com/jenkinsci/workflow-step-api-plugin/pull/226#issuecomment-3032799853

Testing done

Submitter checklist

  • [x] Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • [x] Ensure that the pull request title represents the desired changelog entry
  • [x] Please describe what you did
  • [ ] Link to relevant issues in GitHub or Jira
  • [ ] Link to relevant pull requests, esp. upstream and downstream changes
  • [x] Ensure you have provided tests that demonstrate the feature works or the issue is fixed

jgreffe avatar Jun 30 '25 16:06 jgreffe

I am not sure I follow the use case, and there is no downstream PR.

My general recommendation for any code creating an ExecutorService is

try { // TODO Java 21+
    return (ExecutorService) Executors.class.getMethod("newVirtualThreadPerTaskExecutor").invoke(null);
} catch (NoSuchMethodException x) {
    return Executors.newCachedThreadPool(/* as before */);
} catch (Exception x) {
    throw new AssertionError(x);
}

which should probably be made into an API in Jenkins core.

jglick avatar Jun 30 '25 16:06 jglick

I am not sure I follow the use case, and there is no downstream PR.

My general recommendation for any code creating an ExecutorService is

try { // TODO Java 21+
    return (ExecutorService) Executors.class.getMethod("newVirtualThreadPerTaskExecutor").invoke(null);
} catch (NoSuchMethodException x) {
    return Executors.newCachedThreadPool(/* as before */);
} catch (Exception x) {
    throw new AssertionError(x);
}

which should probably be made into an API in Jenkins core.

Just filed the downstream PR: https://github.com/jenkinsci/opentelemetry-plugin/pull/1139

jgreffe avatar Jun 30 '25 17:06 jgreffe

return (ExecutorService) Executors.class.getMethod("newVirtualThreadPerTaskExecutor").invoke(null);

My (limited) understanding of the current state with virtual threads was that unless you are running homogenous tasks that you know do not use synchronized, you have to be wary pending https://openjdk.org/jeps/491 in Java 24. In Jenkins at least we often use ExecutorService for heterogenous tasks that are not known up front and synchronized is common in the ecosystem, so I wasn't sure about using virtual threads just yet. Have you found them to work fine in practice?

dwnusbaum avatar Jun 30 '25 21:06 dwnusbaum

FTR @jgreffe if you want to update the PR description here is some extra context:

opentelemetry wants to make it possible for other plugins to augment the traces produced for Jenkins builds. See https://github.com/jenkinsci/opentelemetry-plugin/pull/896 and https://github.com/jenkinsci/opentelemetry-plugin/pull/909 for more context. See https://github.com/jenkinsci/junit-sql-storage-plugin/pull/413 for an example of a plugin using the OTel context to emit custom spans as part of the junit step.

In order for this to work, plugins need access to an OTel Context in their build steps and/or related code so that they can produce a Span with the correct parent. By default, OTel contexts are stored as a ThreadLocal variable. opentelemetry's synchronous GraphListener implementation sets up this context on the CPS VM thread when a new step starts. For Pipeline steps with fully synchronous StepExecutions, that is enough for context propagation to work correctly within the extent of StepExecution.start. However, for steps whose execution runs on another thread, even if they complete synchronously from the perspective of the Pipeline (e.g. SynchronousNonBlockingStepExecution, GeneralNonBlockingStepExecution), the context will be unavailable, and so any created OTel span will not have the correct parent.

opentelemetry plugin currently handles this case by reflectively modifying SynchronousNonBlockingStepExecution.executorService to wrap it with io.opentelemetry.context.Context.taskWrapping, which automatically propagates the thread local OTel context to tasks run on the executor service. This works, but this kind of reflection is brittle, and we would like to support this use case in a more maintainable way. This PR currently just exposes a @Restricted(Beta.class) API to allow opentelemetry to augment SynchronousNonBlockingStepExecution.executorService with Context.taskWrapping without using reflection.

There may be other ways to accomplish the same use case, but we have not explored them at this time: For example:

  • Perhaps we should instead encourage plugins with build steps that want to support tracing to always look up their parent span using these methods and then set up the context manually as needed? For junit-sql-storage this would mean that the junit step would need to be updated to look up the span and then set up the thread local context for methods in the custom storage provider.
  • Similarly, could we somehow inject the OTel Context into the StepContext for each step automatically, and then have some kind of method like OtelPlugin.useContext(StepContext) that sets up the thread local for plugins that need it? This would only work for Pipeline Steps, but maybe that's ok?
  • There is a ContextStorageProvider API. Perhaps we could implement this in a way that works better for our use case?

dwnusbaum avatar Jul 01 '25 17:07 dwnusbaum

you have to be wary

Sure, it depends on what the thread is doing. Anyway that is irrelevant to the apparent purpose of this API based on the downstream PR.

jglick avatar Jul 01 '25 18:07 jglick

you have to be wary

Sure, it depends on what the thread is doing. Anyway that is irrelevant to the apparent purpose of this API based on the downstream PR.

@jglick : then should I apply your suggestion? https://github.com/jenkinsci/workflow-step-api-plugin/pull/226#issuecomment-3019888216

jgreffe avatar Jul 03 '25 15:07 jgreffe

Looking into a similar case in https://github.com/jenkinsci/opentelemetry-plugin/issues/1137, I also thought about instead adding a dependency on opentelemetry-api here and then just directly adding Context.taskWrapping unconditionally, and then deleting the relevant code in opentelemetry.

I am a little more hesitant about adding dependencies to this plugin than to github-branch-source though, and we would probably still need an indirection via an optional extension for FIPS compliance reasons. I am inclined to stick with the current approach for now, wait a bit for OTel usage to increase and for stuff like https://github.com/jenkinsci/opentelemetry-api-plugin/pull/45, and then reconsider that approach.

dwnusbaum avatar Jul 03 '25 16:07 dwnusbaum

adding a dependency on opentelemetry-api here

I would rather avoid that.

jglick avatar Jul 14 '25 18:07 jglick

@jglick , @dwnusbaum , could you have a look at the last state? 🙏🏼

jgreffe avatar Jul 16 '25 14:07 jgreffe

@jgreffe This looks more or less fine to me, but I think we should wait until we have time to discuss with the OTel maintainers in a week or so to make sure we are aligned on the various changes we have been looking into. I don't want to merge this until we are confident that the downstream PR will be accepted.

dwnusbaum avatar Jul 16 '25 14:07 dwnusbaum

Moving it to draft then @dwnusbaum

jgreffe avatar Jul 16 '25 14:07 jgreffe