spring-boot icon indicating copy to clipboard operation
spring-boot copied to clipboard

Migrate to Brave 6 and Zipkin Reporter 3

Open codefromthecrypt opened this issue 2 years ago • 19 comments

This updates to Brave 6 and Zipkin Reporter 3 which are compatible with the classpath of Otel 1.35

Here are some new features for end-users, applicable to both Otel and Brave senders:

  • Use protobuf by setting the property "management.zipkin.tracing.encoding" to "PROTO3"
  • Integrate a dynamic zipkin endpoint source by providing a bean HttpEndpointSupplier.Factory
    • This will allow spring-cloud-loadbalancer or discovery integration like sleuth had, but also for otel
    • The input to this function is ZipkinConnectionDetails.getSpanEndpoint()
    • When absent ZipkinConnectionDetails.getSpanEndpoint() is called once like before.

codefromthecrypt avatar Jan 07 '24 08:01 codefromthecrypt

@wilkinsona I think we can get away from all these deprecated things, sorry about the distraction. I'll mark draft and triple check!

codefromthecrypt avatar Jan 08 '24 12:01 codefromthecrypt

should be an easier review, now. I had to add the zipkin-reporter bom not because of code in this change, but because Brave 6 no longer manages zipkin-reporter (Brave 6 has no dependency on zipkin types). spring-boot-actuator-autoconfigure defines reporter types, so needs this for those to compile. When that code updates to Zipkin Reporter 3 is up to you, I'm fine helping soon, but if not this week I can't promise availability as this stuff isn't related to my dayjob anymore.

codefromthecrypt avatar Jan 09 '24 03:01 codefromthecrypt

actually update to zipkin-reporter v3 will need to wait until this is released, as it seems boot is also importing otel exporter for OpenTelemetryConfiguration (note brave doesn't actually use this type). Anyway, migration to v3 is changing packages, so anyone can do it similar to this https://github.com/openzipkin/zipkin-aws/pull/208

codefromthecrypt avatar Jan 09 '24 04:01 codefromthecrypt

We can't upgrade to Brave 6.0 until Micrometer has done so.

wilkinsona avatar Jan 09 '24 08:01 wilkinsona

Soft bump 🙏🏾 hope everyone is having a lovely day!

efemoney avatar Feb 09 '24 05:02 efemoney

@efemoney I think the current status is micrometer are awaiting https://github.com/open-telemetry/opentelemetry-java to release 1.35.0. Then micrometer can update to that and brave 6 etc per notes here. Once micrometer is converged, then this one can progress. Finally, we can basically not do this again for several years :)

TL;DR; waiting on otel to release and I don't know the schedule of that

codefromthecrypt avatar Feb 09 '24 07:02 codefromthecrypt

Thank you. I checked the micrometer PR but somehow missed that 🙌🏾

efemoney avatar Feb 09 '24 07:02 efemoney

turns out https://github.com/micrometer-metrics/tracing/pull/536 only affected test deps as there was no main code change. So, there's no inter-dependency afaict except maybe if otel latest is needed here. Anyway I will sort this out next!

codefromthecrypt avatar Feb 10 '24 09:02 codefromthecrypt

ok there's probably a lint or some test glitch, but I'm time out for the day.

Effectively, we don't want to depend on zipkin core (zipkin.Reporter) types anymore, so detecting Brave already configured if there is an AsyncZipkinSpanHandler rather wrapping and bringing back a zipkin dep with AsyncReporter

Also, almost everyone uses json and there's no built-in set of encoders in brave. So, I used optional injection of an encoder for both use cases brave and otel. The only known custom user of that encoder would be zipkin-gcp, and again lacking any built-in choice except json, optional is the best way.

Feedback welcome, but I think you can see where I'm going with this and it is a lot simpler to implement.

codefromthecrypt avatar Feb 10 '24 10:02 codefromthecrypt

encoder swapping works, but marking draft as finally have tests running locally. I need to look into these:

ZipkinAutoConfigurationIntegrationTests > zipkinsUseOfRestTemplateDoesNotCauseACycle() FAILED
    java.lang.AssertionError at ZipkinAutoConfigurationIntegrationTests.java:52

ZipkinAutoConfigurationIntegrationTests > zipkinsUseOfWebClientDoesNotCauseACycle() FAILED
    java.lang.AssertionError at ZipkinAutoConfigurationIntegrationTests.java:59

ZipkinAutoConfigurationTests > definesPropertiesBasedConnectionDetailsByDefault() FAILED
    java.lang.AssertionError at ZipkinAutoConfigurationTests.java:39

ZipkinAutoConfigurationTests > shouldUseCustomConnectionDetailsWhenDefined() FAILED
    java.lang.AssertionError at ZipkinAutoConfigurationTests.java:52

codefromthecrypt avatar Feb 11 '24 01:02 codefromthecrypt

Fyi: Micrometer Tracing 1.3.0-M1 has the Brave 6 (+ OTel 1.35) upgrade.

jonatan-ivanov avatar Feb 12 '24 23:02 jonatan-ivanov

@jonatan-ivanov cool I will mark this one final today, just sorting out one glitch this morning.

codefromthecrypt avatar Feb 12 '24 23:02 codefromthecrypt

almost done.. sorry I meant to finish yesterday, but 100pct will today.

codefromthecrypt avatar Feb 14 '24 04:02 codefromthecrypt

ok I've updated the description and also want to thank @reta and @anuraaga who spent the last several days with me making sure the baseline library is high quality.

Particularly, we have a replacement for how to integrate eureka at a lower level (the sender) so that brave and otel can use it. Basically, eureka integration is pinned to sleuth (boot 2.x) until this is out.

I'm doing some final integration tests of micrometer tracing around this, and following that will remove the snapshot version of reporter for a real one.

codefromthecrypt avatar Feb 14 '24 07:02 codefromthecrypt

if someone can click approve on the workflow, as far as I know this is done except getting rid of the snapshot dep. I'll do more integration testing meanwhile, but curious if the official CI is happy.

codefromthecrypt avatar Feb 14 '24 10:02 codefromthecrypt

ok I moved to release version and here's the bit that works in a separate project and re-routes endpoint to loadbalancer like sleuth once did. cc @spencergibb @OlgaMaciaszek @marcingrzejszczak @jonatan-ivanov @shakuzen

I'm sure you'd do it differently, but anyway here it is!

package brave.example;

import java.net.URI;
import org.springframework.cloud.client.ServiceInstance;
import org.springframework.cloud.client.loadbalancer.LoadBalancerClient;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import zipkin2.reporter.HttpEndpointSupplier;
import zipkin2.reporter.HttpEndpointSuppliers;

@Configuration(proxyBeanMethods = false)
public class ZipkinDiscoveryConfiguration {
  @Bean HttpEndpointSupplier.Factory loadbalancerEndpoints(LoadBalancerClient loadBalancerClient) {
    LoadBalancerHttpEndpointSupplier.Factory httpEndpointSupplierFactory =
        new LoadBalancerHttpEndpointSupplier.Factory(loadBalancerClient);
    // don't ask more than 30 seconds (just to show)
    return HttpEndpointSuppliers.newRateLimitedFactory(httpEndpointSupplierFactory, 30);
  }

  record LoadBalancerHttpEndpointSupplier(LoadBalancerClient loadBalancerClient, URI virtualURL)
      implements HttpEndpointSupplier {
    record Factory(LoadBalancerClient loadBalancerClient) implements HttpEndpointSupplier.Factory {

      @Override public HttpEndpointSupplier create(String endpoint) {
        return new LoadBalancerHttpEndpointSupplier(loadBalancerClient, URI.create(endpoint));
      }
    }

    @Override public String get() {
      // At least spring-cloud-netflix wants the actual hostname as a lookup value
      ServiceInstance instance = loadBalancerClient.choose(virtualURL.getHost());
      if (instance != null) {
        return instance.getUri() + virtualURL.getPath();
      }
      throw new IllegalArgumentException(
          virtualURL.getHost() + " is not an instance registered in Eureka");
    }

    @Override public void close() {
    }

    @Override public String toString() {
      return "LoadBalancer{" + virtualURL + "}";
    }
  }
}

codefromthecrypt avatar Feb 14 '24 12:02 codefromthecrypt

Thanks a lot for the PR, there are some cool features in there! :)

mhalbritter avatar Feb 14 '24 13:02 mhalbritter

rebased with feedback from @mhalbritter accepted! 🤞 on green

codefromthecrypt avatar Feb 14 '24 23:02 codefromthecrypt

red looks unrelated unless I'm missing something

codefromthecrypt avatar Feb 15 '24 09:02 codefromthecrypt

rebased. let me know what else is needed here!

codefromthecrypt avatar Feb 19 '24 23:02 codefromthecrypt

same snapshot download error as before..

codefromthecrypt avatar Feb 20 '24 01:02 codefromthecrypt

Don't worry. I'll take a look at the PR and merge it. It builds fine on my machine.

mhalbritter avatar Feb 20 '24 12:02 mhalbritter

Thanks a lot for the PR!

mhalbritter avatar Feb 20 '24 12:02 mhalbritter

thanks @mhalbritter .. one less thing on my nag list, but more importantly a big step forward for those using this stuff. cheers!

codefromthecrypt avatar Feb 20 '24 22:02 codefromthecrypt