Add LLM observability and tracing
Summary
Docs includes an integrated LLM to assist users while editing. Some users report that responses are slow, inaccurate, or occasionally include out-of-context elements. We currently lack detailed visibility into how the LLM behaves in production, which limits our ability to diagnose and improve the experience. I propose adding LLM observability (e.g., via Langfuse) to trace, analyze, and debug all LLM interactions.
Problem
We have no structured observability on LLM usage in Docs. As a result, we cannot easily identify:
- why some responses are slow,
- which prompts lead to low-quality or hallucinated output,
- how tools are being called internally,
- or whether issues correlate with specific users, documents, or workflows.
This makes debugging slow and prevents data-driven improvements.
Proposed Solution
Integrate an LLM observability tool—such as Langfuse—to trace all LLM interactions. This should include:
- Logging the input prompt and any system/internal transformations
- Logging the model output, latency, and token usage
- Logging all tool calls with inputs/outputs
- Linking each trace to the User ID for cross-referencing
This will allow us to diagnose issues, understand failure patterns, and iterate on the LLM feature with proper data.