Event API calls should loop back to log frameworks without cycles
Currently, there are two sources of data for the log SDK:
- Log appenders for log4j, logback, etc bridge log records recorded via their respective APIs:
log framework -> log bridge API -> log SDK
These logs are probably already logged to console, files or some other local location. Most users are configuring the log bridge to export logs to a network location via OTLP in addition to their existing local logging. Fair enough. Makes sense.
- Event API records data directly from instrumentation.
instrumentation -> event API -> event SDK -> log SDK
As of today, its only really easy to send these log records to a network location via OTLP. But people aren't going to want to use the event API for instrumentation if there isn't a good story making that data available to the user via local logs. And given that the log SDK is explicitly designed to not try to reinvent the wheel of existing log frameworks, this means that we need a way for log records recorded via the event API to be bridged back to existing log frameworks.
If a log appender is the name we give to a bridge from a log API / framework to the opentelemetry log bridge, what do we call the thing that goes the other way?
How do we avoid loops? If a user has configured logback with the logback appender to bridge into opentelemetry log bridge, and we configure a bridge for event API log records to be bridged back to logback, we need some sort of marker to avoid infinite loops:
event API -> event SDK -> log SDK -> logback bridge -> logback framework
logback framework -> logback appender -> log bridge API -> log SDK -> logback bridge -> logback framework (cycle)
I brought this up a while back here and can't find any issue with folks talking about it.
Triage notes: seems reasonable, though we'd like community feedback about whether people need / want to send events back to logging frameworks like Log4J, in addition to their logging destinations.
We say:
- we don't have a log API because:
- there's lot of prior art and there's no point further bifurcating ecosystems
- there's a ton of surface area around configuring how logs handled (console, file rotation, myriad network exporters, etc) and recreating all of that is not a good use of time
- events are just a particular kind of log
- use our event API in your instrumentation to make it easy to record events
If we don't make it easy for event API logs to participate in the rich existing ecosystem, then events are not just a particular kind of log. Instead, they are a more limited type of log which only has good tooling to export to network locations via OTLP.
I don't think libraries will want to use the event API for instrumentation if we don't have good tooling for the data to end up in their user's existing log ecosystem. Why would a library adopt the event API instead of slf4j?
@jack-berg thanks for raising this as it came out of some of my feedback. I think one reason this hasn't bit people, is lack of adoption of the event API, yet.
I agree what we are talking about is pretty guaranteed, and solving it now vs waiting for more folks to run into it are choices. I suspect there are some prior art even in existing log libraries about this, in cases where they can accidentally create cycles somehow. Have you seen anything?
It would be annoying for sure, if a user attempted to use their local logback bridge then needed to wait for a specification to be written and implemented before getting that to work as expected. def appreciate you thinking ahead.
What if events could be expressed through logging API?
Instead of looping events though log facade should we instead have an API convention to actually report events through logging facades (where possible)?
E.g. with slf4j I can write something like
logger.atInfo()
.addKeyValue("event.name", "com.foo.my-event-id")
.addKeyValue("otel.log.body", myEventBody) // `otel.log.body` property is used as log record body by bridge api
.log("something important") // see https://github.com/open-telemetry/semantic-conventions/issues/1076
Or I can imagine
class MyEvent implements io.otel.events.Event {
@Override
public String getEventName() { return "com.foo.my-event-name"}
@Override
public String toString() {...} // for non-otel logging providers
@Override
public AnyValue getBody() {...} // for otel
}
logger.atInfo()
.addArgument(new MyEvent(...))
.log(null)
I know it changes everything about how we do logs today.
Related:
- https://github.com/open-telemetry/semantic-conventions/issues/1283
- https://github.com/open-telemetry/semantic-conventions/issues/1076
I can say that such setup is possible. I can create prototype in Go if necessary. Otherwise, I plan to close it next week.
But maybe you want to make sure that it is not possible to have a cycle? If so then I think this is implementation/language-specific and may be not possible everywhere. For instance, there is nothing preventing that a log processor would use logs API.
Im planning on implementing this in java during the last couple of weeks of the year when things go quiet 😁
Here's my writeup on how I think we solve this in Java: https://github.com/open-telemetry/opentelemetry-java-instrumentation/pull/15572
TL;DR:
- Add a new log record processor which bridges from OpenTelemetry log SDK to SLF4J (standard logging facade for java)
- This allows logs recorded via OpenTelemetry log API to appear in your log framework (log4j2, logback, JUL)
- Prevent cycles by setting a flag in context (I'm calling it
otel.loopback=truefor now), and having the new processor and appenders look for the flag to prevent double recording
If this concept is useful for other languages, we could reserve one of the bits in LogRecord.flags. This could be useful for standardizing the prevention of double recording logs if you use a combination of SDKs emitting via OTLP, and log scraping via the collector. I.e. the log scraper could look at log flags and skip recording a log if the flag indicates it was already processed by an OpenTelemetry SDK and recorded via OTLP.
If this concept is useful for other languages, we could reserve one of the bits in LogRecord.flags. This could be useful for standardizing the prevention of double recording logs if you use a combination of SDKs emitting via OTLP, and log scraping via the collector. I.e. the log scraper could look at log flags and skip recording a log if the flag indicates it was already processed by an OpenTelemetry SDK and recorded via OTLP.
The idea is neat. However, I suggest to triage it as "deciding:community-feedback" given the cost vs benefit ratio is not clear (maybe even questionable). At least, I doubt it would be needed for OTel Go.