sysdig icon indicating copy to clipboard operation
sysdig copied to clipboard

evt.cputime

Open brendangregg opened this issue 11 years ago • 8 comments

I'd like a field showing nanoseconds of CPU time for the current task, so that CPU time during calls can be inspected. This evt.cputime would only increment when that task (thread) was on-CPU. The kernel already tracks this, so it's a matter of exposing it.

brendangregg avatar Apr 13 '14 19:04 brendangregg

Just to understand, what you want is the number that the topprocs_cpu shows, but on an event basis rather than aggregated by second? Or something different?

ldegio avatar Apr 17 '14 15:04 ldegio

Ah, thanks, %thread.exectime (which topprocs_cpu uses) looks like it should do what I want. Is it possible for it to be exported on more than just switch events?

What I'd like to do is time an event, eg, a read() syscall, and determine if the latency is due to time spent on-CPU or off-CPU. This determination directs further investigation to different tools.

%evt.latency or %evt.rawtime deltas show me the elapsed time for the read() syscall. A %thread.exectime would be used to then divide this time into two states: on- and off-CPU.

brendangregg avatar Apr 17 '14 17:04 brendangregg

Hey Brendan, is this 807e50aaab2519b3a1ecbe725aea986e80525cac what you would expect?

ldegio avatar Apr 28 '14 19:04 ldegio

I commented on the commit (by mistake); anyway, interface looks ok, but was getting zero.

brendangregg avatar Apr 28 '14 21:04 brendangregg

First of all, can you do a pull? b13df5ac336c3dd4ad21832ba822e96e99c90306 fixed an issue that caused the field not to be evaluated.

Then, the filter you're using rejects switch events, which are the ones incrementing the thread CPU time. You should be able to include them by telling sysdig that the filter is a display one:

sysdig -d -p '%proc.name read "%fd.name", %evt.latency ns, %thread.totexectime CPU ns' 'evt.type=read and proc.name=dd'

By the way: the way this works is that the CPU time is updated by looking at scheduler switch events. This means that it's bumped up in discrete intervals, every time there's a switch. I'm not sure if it works for this application. A continuous CPU time would involve adding the CPU counter to every event, which is not a trivial change.

ldegio avatar Apr 29 '14 22:04 ldegio

I'm running:

dd if=/dev/zero of=/dev/null bs=1000000k count=5

And the one-liner reports the read()s with what looks like the right evt.latency, but zero thread.totexectime. The CPU time should match the latency, since these syscalls are just moving bytes in system time.

Ok, I see, so while the kernel tracks it, if it's only incremented on switch (looks like /proc read as well) then that makes it a bit tricky. If the last schedule time is kept somewhere (task_struct->ftrace_timestamp? or something in task_struct->se?), then the current time could be read (CPU TSC) and a delta calculated. Or maybe some of the same functions /proc uses could be called, eg, task_cputime_adjusted().

brendangregg avatar May 25 '14 06:05 brendangregg

Extracting the schedule time is definitely feasible, and that means that adding this feature for live analysis can be done (relatively) easily.

Remember, however, that one of the core philosophies behind sysdig is that observing the live system or taking a capture should give you the exact same result. So I see 3 choices:

  1. attach the CPU counter to specific events like switch. This is what we do now and, as you point out, it's not ideal because it doesn't offer the precision required by some use cases.
  2. attach the CPU counter to every event. This should solve the problem, but creates a major overhead in terms of capture buffer occupation and trace file size.
  3. accept that, for some metrics, the symmetry between live and offline analysis cannot be achieved and export the functionality for live only.

A possible compromise is implementing #2, but keeping it disabled by default. In other words, there would be a command line switch (and a chisel API call) to turn on per-event CPU capture when needed. This is feasible but non trivial to implement and therefore we need to understand how to prioritize it based on the importance of the use cases.

Thoughts?

By the way, it might be worth moving this discussion to the mailing list...

ldegio avatar May 25 '14 17:05 ldegio

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Mar 03 '23 02:03 github-actions[bot]