dd-trace-py icon indicating copy to clipboard operation
dd-trace-py copied to clipboard

ddtrace==2.13.1 profiler issue

Open doudoujay opened this issue 1 year ago • 3 comments

After i updated our docker's ddtrace from ddtrace==2.9.3 to ddtrace==2.13.1, we observed some weird behaviors on profiler. We upgraded on Oct 11.

Heap - Bytes in Use drop to 0 Image Image

Allocated Bytes Significantly increased - Image Image Huge chunk of memory consumed by periodic ensure_binary

Increased CPU Time Image Image

doudoujay avatar Oct 17 '24 18:10 doudoujay

any updates?

doudoujay avatar Oct 22 '24 01:10 doudoujay

@mabdinur @taegyunkim any updates on this?

doudoujay avatar Oct 23 '24 20:10 doudoujay

@doudoujay Hi could you share environment variables that you're giving to ddtrace? e.g. DD_PROFILING_* and DD_TRACE_*. Could you also share Python version too?

taegyunkim avatar Oct 23 '24 23:10 taegyunkim

ENV DD_LOGS_ENABLED=true
ENV DD_LOGS_INJECTION=true
ENV DD_TRACE_ENABLED=true
ENV DD_PROFILING_ENABLED=true
# https://docs.datadoghq.com/tracing/trace_collection/custom_instrumentation/python/otel/
ENV DD_TRACE_OTEL_ENABLED=true
# https://docs.datadoghq.com/profiler/connect_traces_and_profiles/#identify-code-hotspots-in-slow-traces
# only available for ver 2.10.5+
ENV DD_PROFILING_TIMELINE_ENABLED=true
# https://docs.datadoghq.com/dynamic_instrumentation/enabling/python/?tab=environmentvariables
ENV DD_DYNAMIC_INSTRUMENTATION_ENABLED=true
# https://docs.datadoghq.com/tracing/metrics/runtime_metrics/python/
ENV DD_RUNTIME_METRICS_ENABLED=true
# https://docs.datadoghq.com/infrastructure/process/?tab=docker
ENV DD_PROCESS_CONFIG_PROCESS_COLLECTION_ENABLED=true
# code tracking
ARG DD_GIT_REPOSITORY_URL
ARG DD_GIT_COMMIT_SHA
ENV DD_GIT_REPOSITORY_URL=${DD_GIT_REPOSITORY_URL}
ENV DD_GIT_COMMIT_SHA=${DD_GIT_COMMIT_SHA}

@taegyunkim

doudoujay avatar Oct 24 '24 00:10 doudoujay

  1. ensure_binary_or_empty shows up in your profiles as you've set DD_PROFILING_TIMELINE_ENABLED. The flag enables native profile exporter which uses ensure_binary_or_empty to convert Python str to utf-8 encoded bytes which is then passed to C++ native code as std::string_view.
  2. You don't see any frame below ensure_binary_or_empty as the function calls CPython's built-in str.encode() which is implemented in C here and dd-trace-py profiler doesn't show native frames.
  3. Looks like str.encode() always creates an object, allocating memory, even if the given str object is already in the desired encoding, in our case utf-8. We probably should use lower level CPython API, probably PyUnicode_AsUTF8AndSize(), which simply returns const char*, and pass that to C++ native code.

Thank you for pointing this out @doudoujay, will soon work on a fix for this, < 2 weeks.

taegyunkim avatar Oct 24 '24 01:10 taegyunkim

So i guess for short term solution would be we turn off DD_PROFILING_TIMELINE_ENABLED? @taegyunkim

doudoujay avatar Oct 24 '24 07:10 doudoujay

Yes, could you try turning it off?

taegyunkim avatar Oct 24 '24 13:10 taegyunkim

@taegyunkim hey its still taking significant chunk of CPU cycle and memory - Image Image dd version 2.14.4

dd flags:

ENV DD_LOGS_ENABLED=true
ENV DD_LOGS_INJECTION=true
ENV DD_TRACE_ENABLED=true
ENV DD_PROFILING_ENABLED=true
# https://docs.datadoghq.com/tracing/trace_collection/custom_instrumentation/python/otel/
ENV DD_TRACE_OTEL_ENABLED=true
# https://docs.datadoghq.com/dynamic_instrumentation/enabling/python/?tab=environmentvariables
ENV DD_DYNAMIC_INSTRUMENTATION_ENABLED=true
# https://docs.datadoghq.com/tracing/metrics/runtime_metrics/python/
ENV DD_RUNTIME_METRICS_ENABLED=true
# https://docs.datadoghq.com/infrastructure/process/?tab=docker
ENV DD_PROCESS_CONFIG_PROCESS_COLLECTION_ENABLED=true
# code tracking
ARG DD_GIT_REPOSITORY_URL
ARG DD_GIT_COMMIT_SHA
ENV DD_GIT_REPOSITORY_URL=${DD_GIT_REPOSITORY_URL}
ENV DD_GIT_COMMIT_SHA=${DD_GIT_COMMIT_SHA}

doudoujay avatar Oct 24 '24 20:10 doudoujay

@taegyunkim any updates?

doudoujay avatar Oct 28 '24 17:10 doudoujay

@doudoujay that flamegraph doesn't make sense to me as it shows ensure_binary_or_empty even though you don't have none of DD_PROFILING_EXPORT_LIBDD_ENABLED, DD_PROFILING_STACK_V2_ENABLED, or DD_PROFILING_TIMELINE_ENABLED. Could you check again your DD_* environment variables and flamegraph? You can also try to filter the profile using profiler_config:*py* to make sure that the flamegraph you get is only from those services without above environment variables set.

taegyunkim avatar Oct 29 '24 17:10 taegyunkim

@doudoujay Can you also try enabling DD_PROFILING_STACK_V2_ENABLED?

taegyunkim avatar Oct 29 '24 17:10 taegyunkim

Could you check again your DD_* environment variables and flamegraph Im sure i didnt turn on DD_PROFILING_TIMELINE_ENABLED or any of them. @taegyunkim

doudoujay avatar Oct 29 '24 19:10 doudoujay

I will try enable DD_PROFILING_STACK_V2_ENABLED. Whats the expected behavior? @taegyunkim

will soon work on a fix for this, < 2 weeks. When do we expect the fix is in production?

doudoujay avatar Oct 29 '24 19:10 doudoujay

DD_PROFILING_STACK_V2_ENABLED enables our new stack profiler which implemented in native languages, mostly C++/Rust. When you turn it on it also turn on DD_PROFILING_EXPORT_LIBDD_ENABLED and will show less of ensure_binary_or_empty().

Missing live heap profile looks like a separate issue, and we'll need to investigate more on that.

taegyunkim avatar Oct 30 '24 14:10 taegyunkim

And do you mind sharing what Python version you're using? From our testing, Python 3.12 doesn't have the issue ensure_binary_or_empty() allocating a lot of memory.

taegyunkim avatar Oct 30 '24 15:10 taegyunkim

And do you mind sharing what Python version you're using? From our testing, Python 3.12 doesn't have the issue ensure_binary_or_empty() allocating a lot of memory. @taegyunkim Im using python 3.1

FROM python:3.10

saw you closed this did you got it fixed?

doudoujay avatar Nov 05 '24 07:11 doudoujay

Hello @taegyunkim We use python 3.12.2 and we run into high CPU usage issue when upgrading ddtrace. We tried last week ddtrace 2.16.0 and we still notice CPU increase: Image We had to revert to 2.9.2.

slallema avatar Nov 12 '24 10:11 slallema

Seeing the same thing, just rolled back

VivaLaPanda avatar Dec 18 '24 19:12 VivaLaPanda

Using python:3.12-slim and ddtrace 2.18.1 with ddtrace-run uvicorn caused high CPU usage. I rolled back to 2.9.2 and it worked for me.

yeisonvargasf avatar Dec 30 '24 23:12 yeisonvargasf