Add LLM observability and tracing

Open arnaud-robin opened this issue 2 months ago • 0 comments

Summary

Docs includes an integrated LLM to assist users while editing. Some users report that responses are slow, inaccurate, or occasionally include out-of-context elements. We currently lack detailed visibility into how the LLM behaves in production, which limits our ability to diagnose and improve the experience. I propose adding LLM observability (e.g., via Langfuse) to trace, analyze, and debug all LLM interactions.

Problem

We have no structured observability on LLM usage in Docs. As a result, we cannot easily identify:

why some responses are slow,
which prompts lead to low-quality or hallucinated output,
how tools are being called internally,
or whether issues correlate with specific users, documents, or workflows.

This makes debugging slow and prevents data-driven improvements.

Proposed Solution

Integrate an LLM observability tool—such as Langfuse—to trace all LLM interactions. This should include:

Logging the input prompt and any system/internal transformations
Logging the model output, latency, and token usage
Logging all tool calls with inputs/outputs
Linking each trace to the User ID for cross-referencing

This will allow us to diagnose issues, understand failure patterns, and iterate on the LLM feature with proper data.

Dec 01 '25 14:12 arnaud-robin