langfuse-python icon indicating copy to clipboard operation
langfuse-python copied to clipboard

fix: fix Chinese character escaping issue

Open RiviaAzusa opened this issue 4 months ago • 3 comments

Add the ensure_ascii=False parameter to all json.dumps calls to ensure non-ASCII characters (such as Chinese) are not escaped into \uXXXX format.

Files to modify:

langfuse/_client/attributes.py: Core serialization functions langfuse/_utils/request.py: API request data serialization langfuse/_client/utils.py: Debug output formatting langfuse/_task_manager/score_ingestion_consumer.py: Score processing serialization

Before Fix & After Fix: CleanShot 2025-09-09 at 14 06 54@2x


[!IMPORTANT] Add ensure_ascii=False to json.dumps calls to prevent non-ASCII character escaping in multiple files.

  • Behavior:
    • Add ensure_ascii=False to json.dumps in attributes.py for core serialization functions.
    • Add ensure_ascii=False to json.dumps in request.py for API request data serialization.
    • Add ensure_ascii=False to json.dumps in utils.py for debug output formatting.
    • Add ensure_ascii=False to json.dumps in score_ingestion_consumer.py for score processing serialization.

This description was created by Ellipsis for 843ed967a7c63db4e302a16ca04060b96856d465. You can customize this summary. It will automatically update as commits are pushed.

Disclaimer: Experimental PR review

Greptile Summary

Updated On: 2025-09-09 06:09:40 UTC

This PR addresses a localization issue where non-ASCII characters (specifically Chinese characters) were being escaped to \uXXXX format in JSON serialization throughout the Langfuse Python SDK. The fix adds the ensure_ascii=False parameter to all json.dumps() calls across four key files to preserve Unicode characters in their original readable form.

The changes affect multiple serialization touchpoints:

  • langfuse/_client/attributes.py: Core serialization for OpenTelemetry span attributes
  • langfuse/_utils/request.py: API request data serialization sent to Langfuse servers
  • langfuse/_client/utils.py: Debug output formatting for span data
  • langfuse/_task_manager/score_ingestion_consumer.py: Score processing serialization and size calculations

All modified files use the existing EventSerializer class but now explicitly disable ASCII-only encoding. This ensures that when users trace LLM applications containing Chinese text (or other Unicode content), the data remains human-readable in the Langfuse UI, logs, and debug output. The change maintains JSON validity while improving the user experience for international developers.

Confidence score: 5/5

  • This PR is safe to merge with minimal risk as it only affects JSON output formatting without changing data structures or API compatibility
  • Score reflects that the changes are straightforward, well-targeted, and follow a consistent pattern across all affected files
  • No files require special attention as all changes follow the same simple pattern of adding the ensure_ascii=False parameter

RiviaAzusa avatar Sep 09 '25 06:09 RiviaAzusa

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Sep 09 '25 06:09 CLAassistant

Hi @RiviaAzusa , thank you for the contribution! If possible, could you add at least one test for this behavior in order to ensure not running into this problem again? Thanks!

nimarb avatar Sep 16 '25 13:09 nimarb

I really want this PR is merged.

gfx avatar Oct 03 '25 07:10 gfx