ROCK-Kernel-Driver icon indicating copy to clipboard operation
ROCK-Kernel-Driver copied to clipboard

[Issue]: SMI event entry abruptly null terminated

Open briankoco opened this issue 7 months ago • 2 comments

Problem Description

The kernel driver reports SVM/SMI events through a kfifo that is exported for userspace profiler consumption. Each SMI event is formatted as a newline-terminated string.

KFD_SMI_EVENT_QUEUE_RESTORE events are currently not being formatted correctly because a NULL character is added to the event string before the newline character is added. This abruptly NULL terminates it and breaks userspace parsing that expects newline delimited events:

  • https://github.com/ROCm/ROCK-Kernel-Driver/blob/master/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c#L316
  • https://github.com/ROCm/ROCK-Kernel-Driver/blob/master/include/uapi/linux/kfd_ioctl.h#L744

Operating System

Ubuntu 24.04.2 LTS (Noble Numbat)

CPU

Intel(R) Xeon(R) CPU E5-2660 v4 @ 2.00GHz

GPU

AMD Radeon Graphics gfx906

ROCm Version

6.4.0

ROCm Component

ROCK-Kernel-Driver

Steps to Reproduce

  1. Install ROCm on the relevant kernel driver version
  2. Set HSA_SVM_PROFILE=svm.txt
  3. Set HSA_XNACK=0
  4. Run a HIP application that experiences a queue eviction. One example is the hmmstress workload
  5. Observe that the SMI profile dumped to svm.txt has improperly formatted events, owing to the queue restore events which do not contain newlines.

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

briankoco avatar Jul 08 '25 17:07 briankoco