Ocelot icon indicating copy to clipboard operation
Ocelot copied to clipboard

In v23.3.4 `ConsulProviderFactory` can't resolve `IConsulServiceBuilder` via `IServiceProvider`

Open Niksson opened this issue 1 year ago • 15 comments

Expected Behavior / New Feature

Service Discovery via Consul works with extended DefaultConsulServiceBuilder via AddConsul<T>()

Actual Behavior / Motivation for New Feature

When trying to send a request via Ocelot ConsulProviderFactory encounters an error:

2024-10-17 09:03:45 [07:03:45 WRN] requestId: XXX, previousRequestId: No PreviousRequestId, message: 'Error Code: UnableToFindLoadBalancerError Message: Unable to find load balancer for 'GET|/some/random/path|no-host|no-host-and-port|no-svc-ns|mailbox|LeastConnection|no-lb-key'. Exception: System.InvalidOperationException: Cannot resolve scoped service 'Ocelot.Provider.Consul.Interfaces.IConsulServiceBuilder' from root provider.
2024-10-17 09:03:45    at Microsoft.Extensions.DependencyInjection.ServiceProvider.GetService(ServiceIdentifier serviceIdentifier, ServiceProviderEngineScope serviceProviderEngineScope)
2024-10-17 09:03:45    at Microsoft.Extensions.DependencyInjection.ServiceLookup.ServiceProviderEngineScope.GetService(Type serviceType)
2024-10-17 09:03:45    at Microsoft.Extensions.DependencyInjection.ServiceProviderServiceExtensions.GetService[T](IServiceProvider provider)
2024-10-17 09:03:45    at Ocelot.Provider.Consul.ConsulProviderFactory.CreateProvider(IServiceProvider provider, ServiceProviderConfiguration config, DownstreamRoute route)
2024-10-17 09:03:45    at Ocelot.ServiceDiscovery.ServiceDiscoveryProviderFactory.GetServiceDiscoveryProvider(ServiceProviderConfiguration config, DownstreamRoute route)
2024-10-17 09:03:45    at Ocelot.ServiceDiscovery.ServiceDiscoveryProviderFactory.Get(ServiceProviderConfiguration serviceConfig, DownstreamRoute route)
2024-10-17 09:03:45    at Ocelot.LoadBalancer.LoadBalancers.LoadBalancerFactory.Get(DownstreamRoute route, ServiceProviderConfiguration config)
2024-10-17 09:03:45    at Ocelot.LoadBalancer.LoadBalancers.LoadBalancerHouse.GetResponse(DownstreamRoute route, ServiceProviderConfiguration config)
2024-10-17 09:03:45    at Ocelot.LoadBalancer.LoadBalancers.LoadBalancerHouse.Get(DownstreamRoute route, ServiceProviderConfiguration config); errors found in ResponderMiddleware. Setting error response for request path:/some/random/path, request method: GET'

Steps to Reproduce the Problem

  1. Add Consul with a customized IConsulServiceBuilder:
services.AddOcelot()
  .AddConsul<ServiceAddressConsulServiceBuilder>()

The code for this builder is taken from documentation example on how to use service address as host:

  public class ServiceAddressConsulServiceBuilder : DefaultConsulServiceBuilder
{
    public ServiceAddressConsulServiceBuilder(IHttpContextAccessor contextAccessor, IConsulClientFactory clientFactory, IOcelotLoggerFactory loggerFactory) : base(contextAccessor, clientFactory, loggerFactory)
    {
    }

    protected override string GetDownstreamHost(ServiceEntry entry, Node node)
    {
        return entry.Service.Address;
    }
}
  1. Configure Ocelot and a route to use Service Discovery via Consul
  2. Try to hit this route

Specifications

  • Version: 23.3.4
  • Platform: .NET 8
  • Subsystem: Linux (Alpine)

Workarounds:

I was able to work around this issue by reregistering the IConsulServiceBuilder as transient:

services.AddOcelot()
    .AddConsul();

services.RemoveAll<IConsulServiceBuilder>();
services.AddTransient<IConsulServiceBuilder, ServiceAddressConsulServiceBuilder>();

Niksson avatar Oct 17 '24 07:10 Niksson

Hi!

Just a little clarification - I encountered this error in a production project, so I'm not sure if the steps to reproduce accurately represent how to reproduce the problem. I will try and find time to create a sample project to reproduce this error. Hope I can do it soon.

Niksson avatar Oct 17 '24 07:10 Niksson

Hello!

Workarounds:

I was able to work around this issue by reregistering the IConsulServiceBuilder as transient:

services.AddOcelot().AddConsul();
services.RemoveAll<IConsulServiceBuilder>();
services.AddTransient<IConsulServiceBuilder, ServiceAddressConsulServiceBuilder>();

It appears you have a customized solution. Your issue is that adding a default scoped service is not suitable for your environment, as seen here: (https://github.com/ThreeMammals/Ocelot/blob/ce0227c059f220761acc2d880e554174bdb3c544/src/Ocelot.Provider.Consul/OcelotBuilderExtensions.cs#L28) Scoped context refers to the context of the current request. Transient injection is not very practical since service injection occurs in just one class. Therefore, we request you to upload the complete solution as it is customized. Kindly upload the Ocelot application to your account for a comprehensive review by our team❗

P.S.

It appears to be a concurrency issue, as the domain services for this feature are singletons, including the factory class... but I'm not sure.

raman-m avatar Oct 17 '24 10:10 raman-m

I cannot upload the solution as is, since it's a private one. As I said, I will try and create a sample project to see if the error reproduces.

Niksson avatar Oct 17 '24 11:10 Niksson

@Niksson commented on Oct 17

Just a little clarification - I encountered this error in a production project, so I'm not sure if the steps to reproduce accurately represent how to reproduce the problem.

That's unfortunate. How frequently does this error occur? Is it happening with every request or just sporadically?

Additionally, based on your description, the JSON of the app is not attached. Could you please provide the contents of your ocelot.json file?

raman-m avatar Oct 17 '24 11:10 raman-m

While I'm at it, I will note this, from Ocelot's source code:

public class ServiceDiscoveryProviderFactory : IServiceDiscoveryProviderFactory // This is registered as Singleton
    {
        public ServiceDiscoveryProviderFactory(IOcelotLoggerFactory factory, IServiceProvider provider)
        {
            // We inject an IServiceProvider here
            _provider = provider;
            _delegates = provider.GetService<ServiceDiscoveryFinderDelegate>();
           //...
        }

And in the method passed as a delegate:

// Also registered as Singleton
private static IServiceDiscoveryProvider CreateProvider(IServiceProvider provider, ServiceProviderConfiguration config, DownstreamRoute route)
    {
        // ...

       // IConsulServiceBuilder is Scoped, and Scoped services usually cannot be resolved from Singletons
        var serviceBuilder = provider.GetService<IConsulServiceBuilder>(); // consume data in default/custom builder

In singleton services you usually need to create a scope to resolve scoped services, for example:

private static IServiceDiscoveryProvider CreateProvider(IServiceProvider provider, ServiceProviderConfiguration config, DownstreamRoute route)
    {
        // ...

        var scope = provider.CreateScope();
        var serviceBuilder = scope.ServiceProvider.GetService<IConsulServiceBuilder>(); // consume data in default/custom builder

I've found a related article - https://samwalpole.com/blog/using-scoped-services-inside-singletons/index.html

Just to be clear - I'm still working on providing a sample project to help you with investigation

Niksson avatar Oct 17 '24 11:10 Niksson

@raman-m The error occurs on every request. I need to strip the ocelot.json of private info before sending it to you, but I will try to do it with the sample project

Niksson avatar Oct 17 '24 11:10 Niksson

Please recognize that this is an open-source project. Stating "I cannot share code or files" and not revealing all parts of the application implies that we are unable to assist you.

Just to be clear - I'm still working on providing a sample project to help you with investigation

Understood... best of luck! We have other significant tasks to prioritize.

raman-m avatar Oct 17 '24 11:10 raman-m

In singleton services you usually need to create a scope to resolve scoped services, for example:

private static IServiceDiscoveryProvider CreateProvider(IServiceProvider provider, ServiceProviderConfiguration config, DownstreamRoute route)
        // ...
        var scope = provider.CreateScope();
        var serviceBuilder = scope.ServiceProvider.GetService<IConsulServiceBuilder>(); // consume data in default/custom builder

I've found a related article - https://samwalpole.com/blog/using-scoped-services-inside-singletons/index.html

Okay, thanks! That could be the root cause. But why is it impossible to reproduce by our tests? Other developers have informed that the 23.3.4 fix is functioning, which is peculiar. It appears the issue arises in specific types of solutions with numerous customizations.

Have you implemented this fix in your application? Is it functioning properly?


Just to be clear - I'm still working on providing a sample project to help you with investigation

If your production environment is unstable with version 23.3.4, it is advisable to revert to version 23.2.2 immediately.

raman-m avatar Oct 17 '24 11:10 raman-m

Hi again @raman-m! I've created a sample project that includes an Ocelot configuration - https://github.com/Niksson/OcelotSampleProject. I've transfered most of the customizations we made just for the sake of the example, and I could reproduce the error.

This solution uses docker-compose (mostly for running Consul locally). I would advise using it if possible. In the projects you can also find .http files to make requests to the sample API and gateway.

Feel free to ask questions if there are any. Also, there may be errors in ocelot configuration files (specifically ocelot.global.json - I left them as they are in the production code.

Niksson avatar Oct 17 '24 13:10 Niksson

Have you implemented this fix in your application? Is it functioning properly?

No, not yet, I only worked it around using Transient lifetime. I will try referencing the Ocelot source locally and see if I can fix the error by creating the DI scope

Niksson avatar Oct 17 '24 13:10 Niksson

If your production environment is unstable with version 23.3.4, it is advisable to revert to version 23.2.2 immediately.

Yes, we've already done that to be able to keep working

Niksson avatar Oct 17 '24 13:10 Niksson

I will try referencing the Ocelot source locally and see if I can fix the error by creating the DI scope

So I've tried two things, both make the error go away:

  • Creating a new scope:
var scope = provider.CreateScope();
var serviceBuilder = scope.ServiceProvider.GetService<IConsulServiceBuilder>(); // consume data in default/custom builder
  • Using the service provider of the HttpContext, which AFAIK should be scoped to the HTTP request:
var requestScopeServices = contextAccessor.HttpContext.RequestServices;
var serviceBuilder = requestScopeServices.GetService<IConsulServiceBuilder>(); // consume data in default/custom builder

Niksson avatar Oct 17 '24 13:10 Niksson

I had a look at tests and I found out why they don't detect the error.

Apparently, when you create a web app using WebHost.CreateDefaultBuilder(string[] args) or WebApplication.CreateBuilder(string[] args) then the scope validation is turned on in Development mode (see this article). If you create a webhost or service collection in any other way, then the scope validation isn't enabled at all

Let's look at the unit test class for example:

public class ProviderFactoryTests
{
    private readonly IServiceProvider _provider;

    public ProviderFactoryTests()
    {
        // Setup code
       services.AddSingleton(contextAccessor.Object);
       services.AddSingleton(consulFactory.Object);
       services.AddSingleton(loggerFactory.Object);
       // We also register IConsuleServiceBuilder, since it's not registered and GetService<T> will just return null otherwise
       var consulServiceBuilder = new Mock<IConsulServiceBuilder>();       
       services.AddScoped(_ => consulServiceBuilder.Object);
       
       _provider = services.BuildServiceProvider();
    }
...

This still doesn't break the test, because by default scope validation is disabled. Or in the acceptance test:

    public void GivenOcelotIsRunningWithServices(Action<IServiceCollection> configureServices, Action<IApplicationBuilder> configureApp)
    {
        _webHostBuilder = new WebHostBuilder()
            .ConfigureAppConfiguration(WithBasicConfiguration)
            .ConfigureServices(configureServices ?? WithAddOcelot)
            .Configure(configureApp ?? WithUseOcelot);
        _ocelotServer = new TestServer(_webHostBuilder);
        _ocelotClient = _ocelotServer.CreateClient();
    }

The scope validation here is also not enabled by default.


This is something I didn't know about ASP.NET defaults, so TIL!

Niksson avatar Oct 17 '24 14:10 Niksson

You've conducted thorough research, but what is your plan for resolving this? Will you open a PR to address the issues in both ConsulProviderFactory and the testing helpers? It appears we've pinpointed the root causes, so now we can proceed with creating a PR. I assure you that a hotfix PR will be opened, and I will release patch for v23.3.4 as soon as possible.

P.S. Release branch for all 23.3.0+ versions → release/23.3

raman-m avatar Oct 18 '24 09:10 raman-m

Hi @raman-m, There is the same issue with k8s provider. (Invalid scopes with error appearing only in dev mode) But there KubeApiClient has scoped lifetime. Acceptance test is passing because scope validation turned off and KubeClient replaced with singleton. Found related issue without actions. As a workaround IKubeApiClient can be registered as singleton in first place(anyway it's trapped inside singleton) Probably this should be a separate issue 🤔

kick2nick avatar Oct 20 '24 09:10 kick2nick

Hi everyone!

Probably this should be a separate issue 🤔

I agree that refactoring should be another issue, but for quick fixing we can do one of these 2 options:

  • Resolve the relevant IConsulServiceBuilder from the HttpContext scope. I don't have a deep knowledge on IHttpContextAccessor, but my intuition is that the HttpContext instance is a context of the current HTTP request. This would require to fix the tests too
  • Register IConsulServiceBuilder as Singleton for each provider. Which is simpler but probably defeats the purpose of the initial code.

@kick2nick what are your thoughts?

I can definitely create a PR for Consul provider, since I can easily test it now that I have a sample, but I'm not so sure about other providers.

Niksson avatar Oct 21 '24 08:10 Niksson

@kick2nick commented on Oct 20 Thank you for your research on the code! Indeed, the DI issue has been persistent for years, yet it remains unaddressed. The logic involved is quite complex, and a contributor would need an in-depth understanding of the Ocelot codebase to propose a fix.

(Invalid scopes with error appearing only in dev mode)

Typically, we overlook development mode, but your research has revealed a design flaw. Our team has been aware of this issue for over a year, yet we lack the capacity to address everything. The inconsistencies in tests must be resolved. Ocelot should operate correctly in both Development and Release modes.

Probably this should be a separate issue 🤔

Yes, it should! Please transition to discussion #977, which has been reopened today. You may follow this issue, and any upcoming PR should be based on the code from this issue's PR. So, both PRs must be synced somehow....

raman-m avatar Oct 21 '24 08:10 raman-m

@Niksson commented on Oct 21

OK, man...

@kick2nick what are your thoughts?

In fact, you should seek my opinion.

I can definitely create a PR for Consul provider, since I can easily test it now that I have a sample

Alright, proceed! If it takes one day of development, that's fine; go ahead and open a PR. However, if it's going to take up to a week, I can't wait that long. Keep in mind that I'm focused on two issues:

  • Fixing the test helpers and relevant tests, and adding new tests that reproduce the exception with the Consul provider. Testing helpers must validate DI-container scopes always!
  • Implementing a fix in the ConsulProviderFactory. It's important to remember that the service must remain scoped (neither singleton nor transient).

but I'm not so sure about other providers.

Please disregard other discovery providers and concentrate solely on Consul and PollConsul. Thank you!

raman-m avatar Oct 21 '24 09:10 raman-m

@raman-m got it, thanks for the input! I will try to create a PR today.

Niksson avatar Oct 21 '24 09:10 Niksson