baseplate.go icon indicating copy to clipboard operation
baseplate.go copied to clipboard

thriftbp/httpbp: Remove tracing from default middlewares

Open fishy opened this issue 1 year ago • 3 comments

Servers/clients should manually add v2 tracing middlewares if needed, or switch to v2 servers/clients.

fishy avatar Sep 24 '24 21:09 fishy

should manually add v2 tracing middlewares if needed

This change could be jarring for service owners to have tracing unceremoniously dropped and need to scramble to understand why their deployments are suddenly missing observability information.

mathyourlife-reddit avatar Sep 25 '24 14:09 mathyourlife-reddit

Metrics are their own middleware, not relying on the span framework right?

correct

fishy avatar Sep 25 '24 15:09 fishy

👍

This change could be jarring for service owners to have tracing unceremoniously dropped and need to scramble to understand why their deployments are suddenly missing observability information.

I think this change should be fine? IIRC this implementation of tracing doesn't actually do anything anymore?

So we tried to add limited interop to make some of the tracing v0 work auto translate to new tracing system, but the limited interop will cause it to spam logs for server and client spans, as in the tracing v0 implementation we have the assumption that the spans are of a certain type (from v0 implementation) that's broken with the interop. that assumption being broken also likely means that server and ~local~ client spans are not really interop'd into new tracing system correctly (I don't think anyone actually tested/verified that, we only verified that local spans interop works as expected).

so removing the default middlewares actually helps us to enable interop for local spans (local spans are not implemented in the new tracing library so v0 interop is actually the only way to use them right now), without the fear of spamming the logs from server and client spans.

fishy avatar Sep 25 '24 17:09 fishy

After chatting with @redloaf, my understanding is that the httpbpv2 and thriftbpv2 libraries are in alpha and almost ready to ship. The cleanest path is likely to push for services to upgrade to those libraries to get tracing. My biggest hangup on that is GQL. GQL is actively using the http server, and also the thrift and http clients. Breaking tracing in GQL would be a big step backward for us.

I need to look a bit closer, but maybe we can fix the casting logic to work with the bridge spans in the interim?

trevorriles avatar Oct 04 '24 14:10 trevorriles

@trevorriles A few things I want to emphasize here (some are moved from the slack thread):

  1. The middlewares I removed from being default all call AsSpan. They call AsSpan because opentracing API is not enough for what we need them to do so we need to get the underlying raw span to achieve what we need to achieve. But since interop is relying on opentracing API, I highly suspect those interop don't really work, so removing them should not put any dent on the interop.
  2. This PR just removes some middlewares from being the default, the middlewares themselves are still there, and if some of them are actually working with the interop, service owner can still add them on top of the default ones to keep using them.

fishy avatar Oct 04 '24 18:10 fishy

Should the InjectServerSpan logic already be short circuited by this code when using the transition library, without the need to remove it entirely?

https://github.com/reddit/baseplate.go/blob/c7e065837fc5906607a660629fb6f9cb08e3247d/thriftbp/server_middlewares.go#L149-L151

Removing them won't break interop, but I think as @kylelemons mentioned it would break the best-effort support of passing through services that are not using the interop library yet.

trevorriles avatar Oct 07 '24 14:10 trevorriles

Should the InjectServerSpan logic already be short circuited by this code when using the transition library, without the need to remove it entirely?

https://github.com/reddit/baseplate.go/blob/c7e065837fc5906607a660629fb6f9cb08e3247d/thriftbp/server_middlewares.go#L149-L151

Removing them won't break interop, but I think as @kylelemons mentioned it would break the best-effort support of passing through services that are not using the interop library yet.

Oh I didn't know/notice we have those short-circuit.

So I think the problem is that we didn't short-circuit on the client middleware which caused the log spam (and we may or may not want to add server/client short-circuit for httpbp)

basically we need to either add short-circuit or remove the middleware from default for server/client middlewares on thriftbp/httpbp v0.

fishy avatar Oct 07 '24 16:10 fishy

@trevorriles ok I created https://github.com/reddit/baseplate.go/pull/665 to solve this problem differently.

fishy avatar Oct 07 '24 17:10 fishy