logfire icon indicating copy to clipboard operation
logfire copied to clipboard

Add support for structured pydantic log bodies

Open ddluke opened this issue 1 year ago • 1 comments

Description

Hey folks! Our team just did a hands-on trail day yesterday to analyze the capabilities of pydantic-logfire, and the first thing I'd like to say:

  • the ease of use is amazing! :rocket:

pydantic logfire might be one of those desperately needed tools that finally simplify the huge technical (digest tons of documentation and concept pages, understand / reverse engineer the logs, metrics, trace sdks) and infrastructure complexity (run your own collector) of open telemetry to make it more easily adoptable by a less technical audience as well.

Integrating pydantic logfire into one of our microservices has been a no-brainer and stupid simple :rocket:

However, we ran into one thing that really puzzled us:

  • pydantic logfire does not support its own library and enforces log record bodies to be strings

This appears to be a huge technical limitation. We are running a highly automated distributed mlops stack, and we put a decent effort into ensuring that all platform components (microservices running fastapi, orchestration engines ( apache-airlfow), processing jobs (spark), training jobs and many more) issue fully structured logs. And we use pydantic to model our log events which helps a lot to publish fully structured and schema safe log events.

In a nutshell, this is what we do more or less:

import pydantic


class Model(pydantic.BaseModel):
    name: str
    version: str


class Instance(pydantic.BaseModel):
    model_config = pydantic.ConfigDict(populate_by_name=True)

    type_: str = pydantic.Field(alias="type")
    count: int


class EndpointCreated(pydantic.BaseModel):
    name: str
    model: Model
    instance: Instance


somelogger.info(
    EndpointCreated(
        name="my_ml_endpoint",
        model=Model(name="foo", version="2"),
        instance=Instance(type_="medium", count=2)
    ))

Due to above-mentioned pain points, our logs are (alas) not opentelemetry compatible yet, but this is somewhat what the resulting log entry could look like in otel:

{
  "timestamp": 174012134465303,
  "observed_timestamp": 174012134565816,
  "trace_id": "019523d43b42cf4394594759b699305d",
  "span_id": "a9c6c9ec18b1472a",
  "severity_text": "INFO",
  "severity_number": 9,
  "body": {
    "endpoint_name": "my_ml_endpoint",
    "model": {
      "name": "foo",
      "version": "2"
    },
    "instance": {
      "type": "medium",
      "count": 2
    }
  },
  "resource": {
    "service.namespace": "model-serving",
    "service.name": "api",
    "service.version": "3.0.1"
  },
  "attributes": {
    # attributes derived from the pydantic class object 
    "log.record.type": "EndpointCreated",
    "log.record.schema.major": "1",
    "log.record.schema.minor": "3",
    # other standardized otel attributes
  }
}

Using fully structured logs enables a whole new world of automation if log entries are finally fully structured and machine parseable (instead of human-readable unstructured prose).

Don't get me wrong. I do understand that a logging library somehow needs to support "good" old fstring logging, but having no support for fully structured log events feels somewhat strange, especially if built by a team that essentially created a powerful and enjoyable python serde library in the first place 🤔

  • Has this been considered, but rejected?
  • If so, for what reasons? 🤔

Amongst lot's of other things, think of use cases such as this (creating log views that instantly show all ml models ever deployed, alongside other standardized metadata validated by pydantic):

SELECT
    trace_id
    , span_id
    , body ->> 'endpoint_name' as endpoint_name
    , body ->> 'model' ->> 'name' as model_name
    , body ->> 'model' ->> 'version' as model_version
    , body ->> 'instance' ->> 'type' as instance_type
    , body ->> 'instance' ->> 'count' as instance_count
FROM
    records

Kind regards!

ddluke avatar Feb 21 '25 07:02 ddluke

This is similar to https://github.com/pydantic/logfire/issues/867. The main problem is that in OpenTelemetry, all spans should have a low cardinality span name (a string) to allow filtering for related spans. It's not clear what we'd use for that if given an object.

alexmojaki avatar May 29 '25 13:05 alexmojaki