dd-trace-py icon indicating copy to clipboard operation
dd-trace-py copied to clipboard

Celery subtask traces are missing (distributed tracing)

Open jheyder213 opened this issue 3 years ago • 5 comments

Hello all,

I am facing a very similar bug as described here

Which version of dd-trace-py are you using?

1.1.4

Which version of pip are you using?

pip 22.0.3

Which version of the libraries are you using?

celery 5.2.7 redis 4.3.3

How can we reproduce your problem?

Create an application with celery workers and a redis broker with ddtrace installed. Add the environment flag DD_CELERY_DISTRIBUTED_TRACING="true" and enable celery tracing. Annotate your functions with @tracer.wrap to show custom spans. Split worker tasks into subtasks, e.g. backend -> worker task A -> worker task B, worker task C.

What is the result that you get?

Only traces (spans) from the backend and worker-task A are shown in datadog. Traces for worker task B and C only show up if there is an error in the computation.

What is the result that you expected?

Traces for worker task B and C always show up.

As I can see the traces in the main worker task and when an error shows up, I assume the application is configured correctly. However, subtasks of celery are not traced correctly.

jheyder213 avatar Jun 21 '22 08:06 jheyder213

@Kyle-Verhoog Do you have tests that ensure this should work?

ghost avatar Jun 22 '22 09:06 ghost

retracting statement This is working as expected

adawalli avatar Jul 06 '22 01:07 adawalli

@adawalli Could you specify? You have tests that confirm that the traces for tasks B and C are collected?

This is still an issue on our project.

ghost avatar Jul 08 '22 10:07 ghost

I don't have tests per say.

@celery.task
def b():
  logger.info("test")

@celery.task
def a():
  logger.info("test")

@celery.task
def c():
 a.delay()
 b.delay()

With distributed tracing turned on, I am easily able to see in datadog that the trace id from c is propagated as the trace ID in a and b (I can also dump the headers in the rabbitmq task and see the headers added)

With celery distributed tracing turned off, a, b, and c all have separate trace IDs

adawalli avatar Jul 08 '22 15:07 adawalli

Did you test it with redis instead of rabbitmq as well? I am wondering what is causing the information loss for us. If an error is raised in one of the subtasks, we receive the traces. So they can be traced, but are not received.

ghost avatar Jul 08 '22 15:07 ghost

hey @jheyder213! Sorry for the super late reply. Are you setting DD_CELERY_DISTRIBUTED_TRACING on the worker processes as well? This is a common gotcha we see with the celery integration - the environment variable has to be enabled set on both the producer and consumer processes.

We do have tests ensuring the distributed tracing feature. Celery has always been a pain for us though, so it could be a bug that you're seeing.

Kyle-Verhoog avatar Sep 22 '22 08:09 Kyle-Verhoog

Hello @Kyle-Verhoog,

Thank you for the late answer! We are setting the DD_CELERY_DISTRIBUTED_TRACING flag on the worker process as well. It seems that the issue was that the traces were silently dropped because they were too large. Increasing the DD_TRACE_WRITER_BUFFER_SIZE_BYTES and DD_TRACE_WRITER_MAX_PAYLOAD_SIZE_BYTES limits seems to have resolved the issue.

It would be great to have a better indication within the datadog UI that traces are dropped due to max-size (or submitting the truncated trace).

ghost avatar Sep 22 '22 08:09 ghost

hey @jheyder213, thanks for the quick follow up!

Ah I see, that would do it.

It would be great to have a better indication within the datadog UI that traces are dropped due to max-size (or submitting the truncated trace).

Yes! This is something that is in the works! We'll hopefully be seeing this soon 👀

I'm going to close the issue now that it is resolved 🙂

Kyle-Verhoog avatar Sep 22 '22 14:09 Kyle-Verhoog