BotSharp icon indicating copy to clipboard operation
BotSharp copied to clipboard

Add Langfuse diagnostics and GiteeAI plugin

Open geffzhang opened this issue 3 months ago • 6 comments

User description

Introduced OpenTelemetry-based model diagnostics with Langfuse integration, including new helper classes and activity tracing for agent and function execution. Added BotSharp.Plugin.GiteeAI with chat and embedding providers, and updated solution/project files to register the new plugin. Enhanced tracing in routing, executor, and controller logic for improved observability.


PR Type

Enhancement, Documentation


Description

  • Implemented OpenTelemetry-based model diagnostics with Langfuse integration for improved observability

  • Added GiteeAI plugin with chat completion and text embedding providers

  • Enhanced tracing in routing, executor, and controller logic for agent and function execution

  • Integrated activity tracking with semantic conventions for GenAI operations


Diagram Walkthrough

flowchart LR
  A["OpenTelemetry Setup"] --> B["ModelDiagnostics Helper"]
  B --> C["Activity Tracing"]
  C --> D["Chat Providers"]
  C --> E["Function Executors"]
  C --> F["Routing Service"]
  G["GiteeAI Plugin"] --> D
  H["Langfuse Config"] --> A

File Walkthrough

Relevant files
Configuration changes
6 files
Extensions.cs
Configure OpenTelemetry with Langfuse exporter                     
+49/-3   
BotSharp.Plugin.GiteeAI.csproj
Create GiteeAI plugin project file                                             
+31/-0   
BotSharp.sln
Register GiteeAI plugin in solution                                           
+11/-0   
WebStarter.csproj
Add GiteeAI plugin project reference                                         
+1/-0     
appsettings.json
Add Langfuse and GiteeAI model configurations                       
+85/-40 
Program.cs
Comment out MCP service configuration                                       
+2/-2     
Enhancement
17 files
LangfuseSettings.cs
Add Langfuse configuration settings model                               
+19/-0   
ModelDiagnostics.cs
Implement model diagnostics with semantic conventions       
+394/-0 
ActivityExtensions.cs
Add activity extension methods for tracing                             
+119/-0 
AppContextSwitchHelper.cs
Helper to read app context switch values                                 
+35/-0   
FunctionCallbackExecutor.cs
Add activity tracing to function execution                             
+16/-2   
MCPToolExecutor.cs
Add activity tracing to MCP tool execution                             
+34/-17 
RoutingService.InvokeAgent.cs
Add agent invocation activity tracing                                       
+4/-1     
RoutingService.InvokeFunction.cs
Import diagnostics for function invocation                             
+1/-0     
RoutingService.cs
Add System.Diagnostics import for tracing                               
+1/-0     
ConversationController.cs
Add activity tracing to conversation endpoints                     
+35/-19 
ChatCompletionProvider.cs
Integrate model diagnostics into chat completion                 
+78/-66 
ChatCompletionProvider.cs
Integrate model diagnostics into chat completion                 
+65/-52 
GiteeAiPlugin.cs
Create GiteeAI plugin with DI registration                             
+19/-0   
ChatCompletionProvider.cs
Implement GiteeAI chat completion provider                             
+496/-0 
TextEmbeddingProvider.cs
Implement GiteeAI text embedding provider                               
+73/-0   
ProviderHelper.cs
Add helper to create GiteeAI client instances                       
+16/-0   
Using.cs
Add global using statements for GiteeAI plugin                     
+15/-0   
Documentation
1 files
README.md
Add GiteeAI plugin documentation                                                 
+8/-0     
Formatting
1 files
Program.cs
Reorder using statements for clarity                                         
+2/-3     

geffzhang avatar Oct 17 '25 14:10 geffzhang

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Sensitive data in traces

Description: Diagnostic switches enable sensitive events which may capture and export user
prompts/messages and agent inputs/outputs in traces, risking PII exposure if telemetry
backend is not properly secured.
Extensions.cs [52-55]

Referred Code
// Enable model diagnostics with sensitive data.
AppContext.SetSwitch("BotSharp.Experimental.GenAI.EnableOTelDiagnostics", true);
AppContext.SetSwitch("BotSharp.Experimental.GenAI.EnableOTelDiagnosticsSensitive", true);

Insecure auth transport

Description: Constructs Basic Auth header for OTLP exporter to Langfuse and sends it over HTTP
Protobuf; if the host is misconfigured to non-TLS, credentials could be exposed.
Extensions.cs [139-151]

Referred Code
var publicKey = langfuseSection.GetValue<string>(nameof(LangfuseSettings.PublicKey)) ?? string.Empty;
var secretKey = langfuseSection.GetValue<string>(nameof(LangfuseSettings.SecretKey)) ?? string.Empty;
var host = langfuseSection.GetValue<string>(nameof(LangfuseSettings.Host)) ?? string.Empty;
var plainTextBytes = System.Text.Encoding.UTF8.GetBytes($"{publicKey}:{secretKey}");
string base64EncodedAuth = Convert.ToBase64String(plainTextBytes);

builder.Services.ConfigureOpenTelemetryTracerProvider(tracing => tracing.AddOtlpExporter(options =>
{
    options.Endpoint = new Uri(host);
    options.Protocol = OtlpExportProtocol.HttpProtobuf;
    options.Headers = $"Authorization=Basic {base64EncodedAuth}";
})
);
Ticket Compliance
🎫 No ticket provided
- [ ] Create ticket/issue <!-- /create_ticket --create_ticket=true -->

</details></td></tr>
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
No custom compliance provided

Follow the guide to enable custom compliance check.

  • [ ] Update <!-- /compliance --update_compliance=true -->
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

qodo-code-review[bot] avatar Oct 17 '25 14:10 qodo-code-review[bot]

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Refactor providers to reduce duplication

The ChatCompletionProvider for the new GiteeAI plugin duplicates code from the
OpenAI and AzureOpenAI providers. To improve maintainability, refactor this by
creating a shared base class for providers with OpenAI-compatible APIs to house
the common logic.

Examples:

src/Plugins/BotSharp.Plugin.GiteeAI/Providers/Chat/ChatCompletionProvider.cs [15-100]
public class ChatCompletionProvider(
    ILogger<ChatCompletionProvider> logger,
    IServiceProvider services) : IChatCompletion
{
    protected string _model = string.Empty;

    public virtual string Provider => "gitee-ai";

    public string Model => _model;


 ... (clipped 76 lines)
src/Plugins/BotSharp.Plugin.OpenAI/Providers/Chat/ChatCompletionProvider.cs [35-119]
    public async Task<RoleDialogModel> GetChatCompletions(Agent agent, List<RoleDialogModel> conversations)
    {
        var contentHooks = _services.GetHooks<IContentGeneratingHook>(agent.Id);
        var convService = _services.GetService<IConversationStateService>();

        // Before chat completion hook
        foreach (var hook in contentHooks)
        {
            await hook.BeforeGenerating(agent, conversations);
        }

 ... (clipped 75 lines)

Solution Walkthrough:

Before:

// In GiteeAI/ChatCompletionProvider.cs
public class ChatCompletionProvider : IChatCompletion
{
    public async Task<RoleDialogModel> GetChatCompletions(...)
    {
        // ... setup ...
        var (prompt, messages, options) = PrepareOptions(agent, conversations);
        using (var activity = ModelDiagnostics.StartCompletionActivity(...))
        {
            // ... call API, process response, set tags ...
        }
    }
    protected (string, ...) PrepareOptions(...) { /* ... complex logic ... */ }
}

// In OpenAI/ChatCompletionProvider.cs
public class ChatCompletionProvider : IChatCompletion
{
    // ... Nearly identical implementation to GiteeAI ...
}

After:

// New Base Class
public abstract class OpenAiCompatibleChatCompletionProvider : IChatCompletion
{
    public abstract string Provider { get; }

    public async Task<RoleDialogModel> GetChatCompletions(Agent agent, List<RoleDialogModel> conversations)
    {
        // ... common setup ...
        var (prompt, messages, options) = PrepareOptions(agent, conversations);
        using (var activity = ModelDiagnostics.StartCompletionActivity(..., Provider, ...))
        {
            // ... common logic for API call, response processing, tags ...
        }
    }

    protected virtual (string, ...) PrepareOptions(...) { /* ... common complex logic ... */ }
}

// Refactored GiteeAI provider
public class GiteeAiChatCompletionProvider : OpenAiCompatibleChatCompletionProvider
{
    public override string Provider => "gitee-ai";
    // ... minimal overrides if needed ...
}

Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies significant code duplication across the new GiteeAI provider and the existing OpenAI and AzureOpenAI providers, which this PR exacerbates, and proposes a valid refactoring that would greatly improve maintainability.

High
Possible issue
Fix incorrect null check

Replace the incorrect null check on langfuseSection with
langfuseSection.Exists() to correctly determine if the configuration section is
present.

src/BotSharp.ServiceDefaults/Extensions.cs [129-130]

 var langfuseSection = builder.Configuration.GetSection("Langfuse");
-var useLangfuse = langfuseSection != null;
+var useLangfuse = langfuseSection.Exists();
  • [ ] Apply / Chat <!-- /improve --apply_suggestion=1 -->
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies that GetSection never returns null, fixing a bug where useLangfuse would always be true, which would lead to a UriFormatException if the configuration section is missing.

Medium
Validate Langfuse configuration values

Add validation to ensure publicKey, secretKey, and host from the Langfuse
configuration are not empty before attempting to create and use them for
authentication.

src/BotSharp.ServiceDefaults/Extensions.cs [139-143]

 var publicKey = langfuseSection.GetValue<string>(nameof(LangfuseSettings.PublicKey)) ?? string.Empty;
 var secretKey = langfuseSection.GetValue<string>(nameof(LangfuseSettings.SecretKey)) ?? string.Empty;
 var host = langfuseSection.GetValue<string>(nameof(LangfuseSettings.Host)) ?? string.Empty;
+
+if (string.IsNullOrEmpty(publicKey) || string.IsNullOrEmpty(secretKey) || string.IsNullOrEmpty(host))
+{
+    return builder;
+}
+
 var plainTextBytes = System.Text.Encoding.UTF8.GetBytes($"{publicKey}:{secretKey}");
 string base64EncodedAuth = Convert.ToBase64String(plainTextBytes);
  • [ ] Apply / Chat <!-- /improve --apply_suggestion=2 -->
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly points out that missing configuration values will lead to a UriFormatException and adds necessary validation to prevent this runtime error, improving the code's robustness.

Medium
General
Remove duplicate tag assignment
Suggestion Impact:The commit removed the duplicated assignment of ModelDiagnosticsTags.OutputTokens, leaving a single SetTag call.

code diff:

             activity?.SetTag(ModelDiagnosticsTags.InputTokens, (tokenUsage?.InputTokenCount ?? 0) - (inputTokenDetails?.CachedTokenCount ?? 0));
-            activity?.SetTag(ModelDiagnosticsTags.OutputTokens, tokenUsage?.OutputTokenCount ?? 0);
-            activity?.SetTag(ModelDiagnosticsTags.OutputTokens, tokenUsage?.OutputTokenCount ?? 0);
+            activity?.SetTag(ModelDiagnosticsTags.OutputTokens, tokenUsage?.OutputTokenCount ?? 0); 
            

Remove the duplicate line that sets the OutputTokens tag.

src/Plugins/BotSharp.Plugin.AzureOpenAI/Providers/Chat/ChatCompletionProvider.cs [128-130]

 activity?.SetTag(ModelDiagnosticsTags.InputTokens, (tokenUsage?.InputTokenCount ?? 0) - (inputTokenDetails?.CachedTokenCount ?? 0));
 activity?.SetTag(ModelDiagnosticsTags.OutputTokens, tokenUsage?.OutputTokenCount ?? 0);
-activity?.SetTag(ModelDiagnosticsTags.OutputTokens, tokenUsage?.OutputTokenCount ?? 0);

[Suggestion processed]

Suggestion importance[1-10]: 3

__

Why: The suggestion correctly identifies a duplicated line of code that sets the OutputTokens tag twice and proposes removing the redundant line, which cleans up the code.

Low
Remove unnecessary semicolon
Suggestion Impact:The stray semicolon following the foreach block was removed in the commit.

code diff:

@@ -42,8 +57,6 @@
         {
             activity.SetTag(tag.Key, tag.Value);
         }
-        ;
-
         return activity;

Remove the unnecessary semicolon after the foreach loop's closing brace.

src/Infrastructure/BotSharp.Abstraction/Diagnostics/ActivityExtensions.cs [43-45]

 activity.SetTag(tag.Key, tag.Value);
 }
-;
 
 return activity;

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 2

__

Why: The suggestion correctly identifies and removes a stray semicolon, which is a minor code style and readability improvement.

Low
  • [ ] Update <!-- /improve_multi --more_suggestions=true -->

qodo-code-review[bot] avatar Oct 17 '25 14:10 qodo-code-review[bot]

This is the result of integrating OpenTelemetry with Langfuse. The integration enables real-time tracing and streaming collection of inputs, outputs, execution states, and exceptions for each node in complex workflows, enhancing observability for debugging and performance optimizationbb711bef218bda393195523105b5ecf0

geffzhang avatar Oct 20 '25 13:10 geffzhang

This is the result of integrating OpenTelemetry with Langfuse. The integration enables real-time tracing and streaming collection of inputs, outputs, execution states, and exceptions for each node in complex workflows, enhancing observability for debugging and performance optimizationbb711bef218bda393195523105b5ecf0

Is there any setting to turn on/off the OpenTelemetry with Langfuse?

iceljc avatar Nov 06 '25 22:11 iceljc

This is the result of integrating OpenTelemetry with Langfuse. The integration enables real-time tracing and streaming collection of inputs, outputs, execution states, and exceptions for each node in complex workflows, enhancing observability for debugging and performance optimizationbb711bef218bda393195523105b5ecf0

Is there any setting to turn on/off the OpenTelemetry with Langfuse? I will refactor this part of the code.

geffzhang avatar Nov 16 '25 02:11 geffzhang

High-Level Objectives

  1. Introduce generalized OpenTelemetry (OTel) model diagnostics with semantic conventions for GenAI operations.
  2. Integrate Langfuse tracing (refactoring away from the prior dedicated BotSharp.Diagnostics.Langfuse project into a unified ModelDiagnostics approach).
  3. Add a new plugin: BotSharp.Plugin.GiteeAI (chat completion + text embedding providers).
  4. Extend tracing coverage across:
    • ConversationController (incoming requests)
    • RoutingService (agent selection and function dispatch)
    • FunctionCallbackExecutor and MCPToolExecutor (tool/function execution)
    • Model providers (Azure OpenAI, OpenAI, and newly added GiteeAI) with activity spans for prompt/response lifecycles.
  5. Enhance configuration (appsettings.json) to include Langfuse and GiteeAI model entries.
  6. Provide initial plugin documentation in README.md.
  7. Prepare architecture diagrams and walkthrough to clarify the tracing and provider orchestration flow.

Architectural Evolution

Previously, Langfuse integration appears to have been isolated (e.g., via a specialized diagnostics project). This PR consolidates diagnostic responsibilities into:

  • ModelDiagnostics.cs: A single, extensible instrumentation layer
  • ActivityExtensions.cs: Shared helpers for span enrichment (e.g., tags for model name, input length, output length, latency, tool invocation context) The removal/refactoring of BotSharp.Diagnostics.Langfuse (as indicated in your request) results in a cleaner, provider-agnostic tracing path.

Core Added / Modified Components

Diagnostics & Tracing Foundation

  • ActivityExtensions.cs (≈119 additions): Adds extension methods to attach standardized attributes (e.g., genai.system, genai.model, usage tokens, error info).
  • AppContextSwitchHelper.cs: Utility for reading runtime feature toggles tied to diagnostics.
  • Extensions.cs: Registers OpenTelemetry, sets up exporters (Langfuse), and configures ActivitySource(s) to emit spans.

Routing & Execution Instrumentation

  • RoutingService (partial modifications in RoutingService.cs, InvokeAgent.cs, InvokeFunction.cs): Injects spans around agent resolution, function decision logic, and fallback/branching behaviors.
  • FunctionCallbackExecutor.cs / MCPToolExecutor.cs: Wrap function/micro-tool execution inside activities, capturing input/output payload summaries and potential exception traces.

Controller Layer

  • ConversationController.cs: Starts top-level request activities (entry points), enabling correlation IDs and trace continuity across chained model/tool operations.

Provider Enhancements

Instrumentation added (or refactored) to:

  • ChatCompletionProvider.cs (multiple instances for different providers; diffs show added tracing blocks wrapping send/receive phases)
  • GiteeAI ChatCompletionProvider.cs: Full new implementation (≈496 lines) with prompt assembly, response parsing, diagnostic events.
  • TextEmbeddingProvider.cs (GiteeAI): Embedding generation with trace spans for vectorization operations.
  • ProviderHelper.cs: Centralizes client instantiation (e.g., constructing a configured HTTP client for GiteeAI with retries or headers).

GiteeAI Plugin Infrastructure

  • BotSharp.Plugin.GiteeAI.csproj: New project definition.
  • GiteeAiPlugin.cs: Dependency injection (DI) registration of chat and embedding providers.
  • Using.cs: Global using directives for plugin namespace cohesion.
  • BotSharp.sln / WebStarter.csproj: Solution and startup integration of the plugin.

Configuration & Settings

  • appsettings.json: Adds Langfuse credentials/config blocks and GiteeAI model definitions (plus adjustments for existing OpenAI/Azure OpenAI entries). Expansions suggest enabling multi-provider side-by-side tracing comparisons.

Documentation

  • README.md: Introduces usage/registration details for the GiteeAI plugin (likely minimal initial documentation, flagged for future expansion).

Formatting / Minor Adjustments

  • Program.cs: Minor cleanup (reordered usings, commented MCP service block temporarily).

Mermaid Diagram Walkthrough

1. High-Level Data & Trace Flow

flowchart LR
  Request["HTTP Request /conversation"] --> CC["ConversationController"]
  CC --> RT["RoutingService"]
  RT --> AGT["Agent Selection"]
  AGT --> DEC{"Dispatch"}
  DEC --> CHAT["ChatCompletion Provider (Azure/OpenAI/GiteeAI)"]
  DEC --> FUNC["FunctionCallbackExecutor"]
  DEC --> MCP["MCPToolExecutor"]
  CHAT --> DIAG["ModelDiagnostics"]
  FUNC --> DIAG
  MCP --> DIAG
  DIAG --> OTEL["OpenTelemetry ActivitySource"]
  OTEL --> LANG["Langfuse Exporter"]
  OTEL --> OTHER["Other OTEL Exporters (e.g., Console, OTLP)"]

2. Detailed Sequence (Conversation -> Agent -> Provider -> Trace)

sequenceDiagram
  participant User
  participant Controller as ConversationController
  participant Router as RoutingService
  participant Agent as Agent/Policy
  participant Provider as Model Provider (Azure/OpenAI/GiteeAI)
  participant Exec as Function/MCP Executor
  participant Diag as ModelDiagnostics
  participant OTel as OpenTelemetry
  participant Lang as Langfuse

  User->>Controller: Chat/Task request
  Controller->>Diag: Start root Activity (request span)
  Controller->>Router: Route(message, context)
  Router->>Agent: Evaluate agent(s)
  Agent-->>Router: Selected agent + strategy
  Router->>Diag: Start agent invocation span
  Router->>Provider: Generate response (chat or embedding)
  Provider->>Diag: Start model operation span
  Provider-->>Diag: Emit attributes (model, tokens, latency, etc.)
  Diag->>OTel: Activity emission
  OTel->>Lang: Export spans (Langfuse sink)
  alt Needs tool/function
    Router->>Exec: Invoke function/MCP tool
    Exec->>Diag: Start tool execution span
    Exec-->>Diag: Output / error metadata
  end
  Provider-->>Router: Response
  Router-->>Controller: Aggregated result
  Controller-->>User: Final structured reply

File Walkthrough (Categorized)

Configuration & Bootstrapping

  • Extensions.cs: Central OTel + Langfuse registration logic
  • appsettings.json: Adds Langfuse settings block and multi-model provider configuration (Azure OpenAI, OpenAI, GiteeAI)
  • Program.cs: Minor startup adjustments (commented MCP service portion)
  • BotSharp.sln / WebStarter.csproj: Solution integration of GiteeAI plugin project

Diagnostics Core

  • ModelDiagnostics.cs: Heart of instrumentation (chat, embedding, function, agent spans)
  • ActivityExtensions.cs: Tagging helpers (model name, message counts, timings, error classification)
  • AppContextSwitchHelper.cs: Toggle-based feature enabling (e.g., conditional tracing intensity)

Routing & Execution

  • RoutingService.cs / InvokeAgent.cs / InvokeFunction.cs: Introduces or expands span boundaries around decision phases
  • FunctionCallbackExecutor.cs / MCPToolExecutor.cs: Wraps function/micro-tool invocation with structured activities

Providers

  • ChatCompletionProvider.cs (multiple provider contexts including Azure/OpenAI): Inject span start/stop, response metrics
  • GiteeAI:
    • ChatCompletionProvider.cs: New provider with full instrumentation
    • TextEmbeddingProvider.cs: Embedding generation + trace capture
    • ProviderHelper.cs: Client construction utilities (e.g., auth/config composition)
    • Using.cs & GiteeAiPlugin.cs: Plugin-level DI and global usings

Documentation

  • README.md: Adds initial GiteeAI plugin usage or description

Miscellaneous

  • Minor formatting (Program.cs) and import adjustments (System.Diagnostics added to several files)

Observability & Semantic Conventions

The tracing layer appears to align with emerging GenAI semantic conventions:

  • genai.model / genai.system: Identify provider & runtime
  • Usage metrics: token counts, input/output sizes
  • Latency: operation timing
  • Error classification: enabling downstream analytics This positions the system for cross-provider performance and reliability comparison, while Langfuse augments prompt/result auditing.

Commit-Level Summary (Conceptual)

While individual commit messages were not visible in the retrieved data, the 26 commits likely progressed through:

  1. Scaffolding diagnostics infrastructure (ActivitySources, settings).
  2. Introducing LangfuseSettings and configuration wiring.
  3. Instrumenting existing providers (Azure/OpenAI).
  4. Adding GiteeAI plugin project + chat provider + embedding provider.
  5. Enhancing routing and execution spans.
  6. Refactoring/removing prior Langfuse-specific project (unifying under ModelDiagnostics).
  7. Expanding appsettings.json for multi-provider support.
  8. Adding README plugin documentation.
  9. Final polish: minor formatting and controller tracing refinements.

Key Benefits

  • Unified tracing across heterogeneous model and tool operations.
  • Pluggable multi-model architecture (Azure OpenAI, OpenAI, GiteeAI) with consistent diagnostics.
  • Improved debuggability (span granularity at agent, function, model levels).
  • Extensibility: Future plugins can hook into ModelDiagnostics without bespoke exporter code.
  • Langfuse integration supports prompt/response observability and post-hoc analysis.

geffzhang avatar Nov 22 '25 13:11 geffzhang