containerd icon indicating copy to clipboard operation
containerd copied to clipboard

Adding OpenTelemetry spans on the Go client's methods, CRI, GC and HTTP calls

Open kzys opened this issue 3 years ago • 7 comments

What is the problem you're trying to solve

containerd 1.6 will have OpenTelemetry spans, but they are added automatically around gRPC calls. There are a few areas we should have spans to make troubleshooting easier.

  • The Go client's methods
  • CRI
  • GC operations
  • HTTP calls around container image operations

Describe the solution you'd like

We need to first define our naming conversion based on https://github.com/open-telemetry/opentelemetry-specification/tree/main/specification/trace/semantic_conventions. After that we can update each component accordingly.

Additional context

No response

kzys avatar Feb 14 '22 20:02 kzys

Wahoo! My 2c is that spans should be on every entry/exit for tracing/debugging. So for example, CRI is all gRPC entry methods, it will automatically start a span. It does not need create internal sub-spans, (unless it wants to for some reason) but when it calls out to containerd via client (a new public API), or the task, or a windows/linux syscall, etc it may want to wrap that call in a span to show it's exit from the current code and entry into the next. This allows for entry and exit measurements, profiling, and tracing. Extremely helpful for both optimizing code, but also finding those hard to reach race conditions

jterry75 avatar Feb 14 '22 20:02 jterry75

Maybe another example since CRI is a "special" plugin. I think this idea holds for all of our services. Ex: We have a service api/local pattern. The span should be created at the service scope, once resolved to the local implementation we dont need a sub-span (most likely) but it is helpful for that service itself to have crated the child. This easily allows us to know when we have entered the "snapshotter service" etc. to do an operation and that it took 15 seconds.

jterry75 avatar Feb 14 '22 20:02 jterry75

add 1.7 milestone here cc @estesp

fuweid avatar Feb 23 '22 15:02 fuweid

@kzys Hi, can I work on this issue.

lengrongfu avatar Sep 23 '22 01:09 lengrongfu

@lengrongfu I did some work on this already and can have a PR ready by next week. I think we can reduce some overlap here since I am mainly focusing on CRI and related go client methods. Do you have any particular area of interest or if you can wait for my PR, that will be even better.

swagatbora90 avatar Sep 23 '22 19:09 swagatbora90

@lengrongfu I did some work on this already and can have a PR ready by next week. I think we can reduce some overlap here since I am mainly focusing on CRI and related go client methods. Do you have any particular area of interest or if you can wait for my PR, that will be even better.

I can pay attention to the point of "HTTP calls around container image operations", mainly where it interacts with the registry. There are also interactions with the registry in the CRI. Have you already dealt with it?

lengrongfu avatar Sep 24 '22 00:09 lengrongfu

I can pay attention to the point of "HTTP calls around container image operations", mainly where it interacts with the registry. There are also interactions with the registry in the CRI. Have you already dealt with it?

Sure, I have not looked into that yet. Also you can take a look at GC operations as well.

swagatbora90 avatar Sep 24 '22 02:09 swagatbora90

@swagatbora90 Hi, i will be working around The Go client's methods, will this work be repeated?

lengrongfu avatar Oct 04 '22 02:10 lengrongfu

@swagatbora90 Hi, i will be working around The Go client's methods, will this work be repeated?

HI @lengrongfu yes I am making changes to CRI runtime apis and and also adding spans in the related client methods which mainly in client.go, task.go, process.goand container.go.

swagatbora90 avatar Oct 04 '22 23:10 swagatbora90

I am listing out some of the current and planned items for adding otel spans to Containerd. This should help streamline some of the work, and others can contribute as well.

Current

  • [ ] GC operation - #7448
  • [x] CRI Image service - #7453
  • [ ] CRI Runtime service - #7616
  • [ ] Containerd Go client: #7616
    • [ ] client.go (7616)
    • [ ] container.go (7616)
    • [ ] process.go (7616)
    • [ ] task.go (7616)
    • [x] pull.go (7453)
  • [x] HTTP calls around container image operations - (7453)
  • [ ] CRI stream server - (#7692)
  • [x] Add Unit/Integration test #7493

This should at least cover the areas listed in this tracking issue. Though there might be scope for additional areas, like tracing calls across containerd and runc boundary.

swagatbora90 avatar Nov 01 '22 18:11 swagatbora90

@swagatbora90 I can look into the Add Unit/Integration test TODO task if you don't mind.

fangn2 avatar Nov 01 '22 21:11 fangn2