[Feature Request] Add resolved callstacks to Execution Events
I'm hoping for the same New Thread Stack and Ready Thread Stack options that Windows CPU Usage (Precise) has.
agreed that would be nice
Is this for LTTng or for Perfetto events?
while stackwalk is theoretically possible to obtain on newest tracing, decoding stack and symbols remain a challenge. There is a lack of a nice symbol decoding methods and format like on Windows. This is necessary to decode the stackwalk both in general for Linux/Android as well as in these trace formats.
I think we have experimental barely working support for LTTng for just KM (kernel-mode) stacks if a special file is provided similar to what Trace Compass does.
This would be for LTTng Execution Events.
Is "experimental barely working support for LTTng for just KM (kernel-mode) stacks" documented anywhere?
We're also evaluating VTune in our search for a WPA-like Linux profiler. It captures Transitions that involve callstacks with our resolved symbols in them. Not quite as comprehensive as Windows CPU Usage (Precise) would be, though.
We're also evaluating VTune in our search for a WPA-like Linux profiler.
So since you used precise language about CPU Usage Precise and scheduling events, I know you are interested in them. However, just to be clear and if others can make use of this, I did want to say we do support Linux profiling with stacks (KM/UM). This is equivalent to Windows CPU Usage (Sampled).
You used the term "Linux profiler" here probably in the generic sense of the word, not the specific sense of an actual profiler which samples the CPU at a specific interval, determining where CPU time is spent on which functions and the stack that led there (profiling). Last I checked LTTng did not support profiling in this sense. Instead, these tools rely on Linux kernel perf tool cpu-clock events where the stack is decoded by perf/Linux on the box before reading into our tool. All this is documented here - https://github.com/microsoft/Microsoft-Performance-Tools-Linux-Android/blob/develop/LinuxTraceLogCapture.md#perf
Is "experimental barely working support for LTTng for just KM (kernel-mode) stacks" documented anywhere?
With that bit on profiling support out of the way let me move to your specific follow-up question. No it has not been documented before the experimental support we have, but probably you could get it to work with a few bugfixes if you want to look. AFAIK symbol info has not been added to Trace Compass traces, although it could technically be done. Therefore, there has to be some manual way to resolve symbols. LTTng grabs the undecoded callstack, but something still needs to resolve the symbols. I will explain / document here where we are:
- Context / Inspiration - Trace Compass (OSS) is a popular GUI for reading LTTng traces, although it has different features than this toolkit and WPA. Trace compass supports providing the kallsyms (KM only) and loading it in the GUI. See https://archive.eclipse.org/tracecompass.incubator/doc/org.eclipse.tracecompass.incubator.kernel.doc.user/User-Guide.html
- This is where we attempt to read kallsyms for Perf cpu-clock events converted to LTTng CTF format. The conversion is a bit of a PITA to do - not really recommended. Anyways, the current checked in algo I think is wrong or sucked here, but it gives you the idea of what to do and where to do it.
- Anyways, this could be fixed and ported to LTTng scheduling events such that at least the KM stacks could be decoded similar to Trace Compass
Maybe try the kallsyms symbol support in Trace Compass if you can, and see if it works well enough for you to want to use it. Then we would be open to a contribution here in these tools to get similar support
Long term here is what I would suggest for the LTTNg and Linux folks to better support call stacks and symbols
- LTTng needs to support profiling (cpu-clock) events
- Embed binary id / signature info into the LTTng trace necessary for symbol decode (similar to what Windows does)
- Have some sort of symbol server or on-demand method to pull down symbols to decode (similar to what Windows does)
FYI This (2) is now deprecated - kallsyms for Perf cpu-clock events converted to LTTng CTF format