openllmetry feat: support emitting events for prompts in addition to current behavior

/claim #2456

✅ PR Requirements

[x] I have added tests that cover my changes.
[ ] If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
[x] PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
[ ] (If applicable) I have updated the documentation accordingly.

📌 Issue Requirements

Fixes the following packages and their respective tests:

[x] AlephAlpha
[x] Anthropic
[x] Bedrock
[x] Cohere
[x] Google Generative AI
[x] Groq
[x] Langchain
[x] LlamaIndex
[x] Mistral AI
[x] Ollama
[x] OpenAI
[x] Replicate
[x] SageMaker
[x] Together
[ ] Transformers*
[ ] VertexAI*
[x] WatsonX

Additional requirements implemented:

[x] Maintain compatibility with the current way of emitting events, and add support for the new event-based system using official semantic conventions.
[x] Add a use_legacy_attributes config flag to each instrumentation (default: true). When false, events are emitted instead of legacy prompt/completion attributes.
[x] Add a global SDK initialization parameter use_legacy_attributes that sets the flag across all instrumentations.

✍️ Personal Notes

Initially, I tried building on top of the existing PR, but it didn’t seem to account for the nuances of each package and the code was quite hard to follow — so I ended up starting from scratch.

I’ve implemented all the required changes for the packages listed above. For testing, I modified existing tests to assert that events are not emitted when use_legacy_attributes == True, ensuring backward compatibility is maintained.

Additionally, I created two new versions of each test:

One to verify the new event-based system is functioning as expected, including the message's contents in the Events
Another to verify the new event-based system is functioning as expected, without including content, following the OpenTelemetry convention and the provided example.

These changes are ready for review and merging.

⚠️ Pending Items and questions:

VertexAI and Transformers currently don’t have tests, so I couldn’t safely implement event emission for them yet. I plan to work on these soon — do you want me to create the legacy and event tests for them as well?
Langchain and LlamaIndex haven’t been checked yet — I’ll be reviewing them today.
I'm having trouble implementing the new OpenAI tests when I change the instrumentations' scope to function during testing, likely related to the _uninstrument method. I’ll post the full details on Slack.

I’ll continue working on the remaining instrumentations today and tomorrow, but the current work is ready for early review. Feedback is welcome!

Apr 10 '25 01:04 LuizDMM

All committers have signed the CLA.

Apr 10 '25 01:04 CLAassistant

:warning: This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at [email protected]

Generated with :heart: by ellipsis.dev

Apr 10 '25 01:04 ellipsis-dev[bot]

Hey @nirga, I'm ready for a review in this one.

Apr 28 '25 01:04 LuizDMM

I think it's a huge PR to be able to merge in one shot given that the repo is constantly updating. I would love it if we can split it to multiple smaller PRs and get them in one-by-ne.

I'll start with alephalpha, a small one, so we can adapt the changes in a smaller scale.

2.(this is actually the bigger issue here) - the code is way too messy. We're now plagued with lots of if self._emit_events_otel everywhere - this will be a nightmare to maintain. I think the structure of the code should be completely different - the instrumentation itself should output everything in one format (let's say it's the events format, but it can even by just a pydantic object or a dataclass). And then there should be a centralized place where this object get translated to the right format according to the _emit_events_otel flag. We can even have a single piece of code that does that for all instrumentations.

Ok, in order to don't add pydantic to every instrumentation as a dependency, I'll use a dataclass. I'm thinking in implementing the same dataclass in each instrumentation, because currently it seems that not every instrumentation has traceloop's package as a dependency. What do you think about that?

May 02 '25 23:05 LuizDMM