Add support for structured pydantic log bodies
Description
Hey folks! Our team just did a hands-on trail day yesterday to analyze the capabilities of pydantic-logfire, and the first thing I'd like to say:
- the ease of use is amazing! :rocket:
pydantic logfire might be one of those desperately needed tools that finally simplify the huge technical (digest tons of documentation and concept pages, understand / reverse engineer the logs, metrics, trace sdks) and infrastructure complexity (run your own collector) of open telemetry to make it more easily adoptable by a less technical audience as well.
Integrating pydantic logfire into one of our microservices has been a no-brainer and stupid simple :rocket:
However, we ran into one thing that really puzzled us:
- pydantic logfire does not support its own library and enforces log record bodies to be strings
This appears to be a huge technical limitation. We are running a highly automated distributed mlops stack, and we put a decent effort into ensuring that all platform components (microservices running fastapi, orchestration engines ( apache-airlfow), processing jobs (spark), training jobs and many more) issue fully structured logs. And we use pydantic to model our log events which helps a lot to publish fully structured and schema safe log events.
In a nutshell, this is what we do more or less:
import pydantic
class Model(pydantic.BaseModel):
name: str
version: str
class Instance(pydantic.BaseModel):
model_config = pydantic.ConfigDict(populate_by_name=True)
type_: str = pydantic.Field(alias="type")
count: int
class EndpointCreated(pydantic.BaseModel):
name: str
model: Model
instance: Instance
somelogger.info(
EndpointCreated(
name="my_ml_endpoint",
model=Model(name="foo", version="2"),
instance=Instance(type_="medium", count=2)
))
Due to above-mentioned pain points, our logs are (alas) not opentelemetry compatible yet, but this is somewhat what the resulting log entry could look like in otel:
{
"timestamp": 174012134465303,
"observed_timestamp": 174012134565816,
"trace_id": "019523d43b42cf4394594759b699305d",
"span_id": "a9c6c9ec18b1472a",
"severity_text": "INFO",
"severity_number": 9,
"body": {
"endpoint_name": "my_ml_endpoint",
"model": {
"name": "foo",
"version": "2"
},
"instance": {
"type": "medium",
"count": 2
}
},
"resource": {
"service.namespace": "model-serving",
"service.name": "api",
"service.version": "3.0.1"
},
"attributes": {
# attributes derived from the pydantic class object
"log.record.type": "EndpointCreated",
"log.record.schema.major": "1",
"log.record.schema.minor": "3",
# other standardized otel attributes
}
}
Using fully structured logs enables a whole new world of automation if log entries are finally fully structured and machine parseable (instead of human-readable unstructured prose).
Don't get me wrong. I do understand that a logging library somehow needs to support "good" old fstring logging, but having no support for fully structured log events feels somewhat strange, especially if built by a team that essentially created a powerful and enjoyable python serde library in the first place 🤔
- Has this been considered, but rejected?
- If so, for what reasons? 🤔
Amongst lot's of other things, think of use cases such as this (creating log views that instantly show all ml models ever deployed, alongside other standardized metadata validated by pydantic):
SELECT
trace_id
, span_id
, body ->> 'endpoint_name' as endpoint_name
, body ->> 'model' ->> 'name' as model_name
, body ->> 'model' ->> 'version' as model_version
, body ->> 'instance' ->> 'type' as instance_type
, body ->> 'instance' ->> 'count' as instance_count
FROM
records
Kind regards!
This is similar to https://github.com/pydantic/logfire/issues/867. The main problem is that in OpenTelemetry, all spans should have a low cardinality span name (a string) to allow filtering for related spans. It's not clear what we'd use for that if given an object.