level-zero icon indicating copy to clipboard operation
level-zero copied to clipboard

Testing fails with segmentation fault

Open mckees opened this issue 1 year ago • 4 comments

OS: Ubuntu 24.04, freshly provisioned Hardware: i7-1185G7, i9-12900 (reproduced on both) Level Zero version: 1.19.2 Setup: I am working to get the level-zero version in Ubuntu bumped to 1.19.2, which is why I'm installing from a PPA

sudo apt install -y build-essential cmake
sudo add-apt-repository ppa:mckeesh/testing
sudo apt -y update
sudo apt install libze1=1.19.2-0ubuntu1
git clone https://github.com/oneapi-src/level-zero
cd level-zero/
git checkout v1.19.2
mkdir build
cd build/
cmake -DBUILD_L0_LOADER_TESTS=yes ..
make -j`nproc`
./bin/tests 

Result:

Running main() from /home/ubuntu/level-zero/build/_deps/googletest-src/googletest/src/gtest_main.cc
[==========] Running 15 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 1 test from LoaderAPI
[ RUN      ] LoaderAPI.GivenLevelZeroLoaderPresentWhenCallingzeGetLoaderVersionsAPIThenValidVersionIsReturned
/home/ubuntu/level-zero/test/loader_api.cpp:26: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInit(0)
    Which is: 2013265921

Found 1 versions

component.component_name: loader
component.component_lib_version.major: 1
component.spec_version: 65548
component.component_lib_name: loader

[  FAILED  ] LoaderAPI.GivenLevelZeroLoaderPresentWhenCallingzeGetLoaderVersionsAPIThenValidVersionIsReturned (0 ms)
[----------] 1 test from LoaderAPI (0 ms total)

[----------] 11 tests from LoaderInit
[ RUN      ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingZeInitDriversWithTypesUnsupportedWithFailureThenSupportedTypesThenSuccessReturned
Segmentation fault (core dumped)

mckees avatar Jan 24 '25 06:01 mckees

For Reference: https://dgpu-docs.intel.com/driver/client/overview.html
View the section on Ubuntu 24.04

Two different things happening here:

Error code 2013265921d is ZE_RESULT_ERROR_UNINITIALIZED = 0x78000001.. ( https://github.com/oneapi-src/level-zero/blob/1c0320bfdf0afe4b361e0297f9d10ac9dd6756fd/include/ze_api.h#L217 ) This indicates the L0 Loader cannot find a working Intel(R) GPU or NPU driver in the system, or those drivers failed to find a valid device to attach to, or are you running on a system with that didn't load a kernel driver. I see you posted the CPUs, both of which have valid HW, so this isn't the issue but are the KMD drivers actually running and is the user mode driver installed. )

First check to see if you've got a User Mode Driver installed and re-test $ sudo apt install libze-intel-gpu1 ( I am assuming your PPA has this package, if not install the one Intel is hosting )

If that doesn't work, check your dmesg to ensure that an Intel GPU is being loaded during boot with an i915.ko (potentially also xekmd.ko if you've got anything newer in a discrete slot).

The segfault, clearly shouldn't happen, but is likely a error induced by missing drivers. We should address the segfault within the L0 Loader to fix it

rwmcguir avatar Jan 25 '25 00:01 rwmcguir

Hmm, interesting. I tried moving to Ubuntu 24.10 since we have been working more on that and have had validation teams check our stack. Here's what I'm seeing now:

ubuntu@hp-elite-mini-800-g9-desktop-pc-c29603:~/level-zero/build$ apt policy libze1 
libze1:
  Installed: 1.19.2.0-1076~24.10
  Candidate: 1.19.2.0-1076~24.10
  Version table:
 *** 1.19.2.0-1076~24.10 500
        500 https://ppa.launchpadcontent.net/kobuk-team/intel-graphics/ubuntu oracular/main amd64 Packages
        100 /var/lib/dpkg/status
     1.17.42-1 500
        500 http://archive.ubuntu.com/ubuntu oracular/universe amd64 Packages
ubuntu@hp-elite-mini-800-g9-desktop-pc-c29603:~/level-zero/build$ apt policy libze-intel-gpu1 
libze-intel-gpu1:
  Installed: 24.52.32224.5-1~24.10~ppa2
  Candidate: 24.52.32224.5-1~24.10~ppa2
  Version table:
 *** 24.52.32224.5-1~24.10~ppa2 500
        500 https://ppa.launchpadcontent.net/kobuk-team/intel-graphics/ubuntu oracular/main amd64 Packages
        100 /var/lib/dpkg/status
     24.35.30872.24-1 500
        500 http://archive.ubuntu.com/ubuntu oracular/universe amd64 Packages
ubuntu@hp-elite-mini-800-g9-desktop-pc-c29603:~/level-zero/build$ ./bin/zello_world 
Driver not initialized: ZE_RESULT_ERROR_UNINITIALIZED
Did NOT find matching ZE_DEVICE_TYPE_GPU device!

mckees avatar Feb 06 '25 06:02 mckees

Showing the same tests as before, I'm seeing more tests run after installing an updated libze-intel-gpu1 version, but there's still the segfaulting and uninitialized issues"

Running main() from /home/ubuntu/level-zero/build/_deps/googletest-src/googletest/src/gtest_main.cc
[==========] Running 15 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 1 test from LoaderAPI
[ RUN      ] LoaderAPI.GivenLevelZeroLoaderPresentWhenCallingzeGetLoaderVersionsAPIThenValidVersionIsReturned
/home/ubuntu/level-zero/test/loader_api.cpp:26: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInit(0)
    Which is: 2013265921

Found 2 versions

component.component_name: loader
component.component_lib_version.major: 1
component.spec_version: 65547
component.component_lib_name: loader

component.component_name: tracing layer
component.component_lib_version.major: 1
component.spec_version: 65547
component.component_lib_name: tracing layer

[  FAILED  ] LoaderAPI.GivenLevelZeroLoaderPresentWhenCallingzeGetLoaderVersionsAPIThenValidVersionIsReturned (24 ms)
[----------] 1 test from LoaderAPI (24 ms total)

[----------] 11 tests from LoaderInit
[ RUN      ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingZeInitDriversWithTypesUnsupportedWithFailureThenSupportedTypesThenSuccessReturned
/home/ubuntu/level-zero/test/loader_api.cpp:64: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInitDrivers(&pCount, nullptr, &desc)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:65: Failure
Expected: (pCount) > (0), actual: 0 vs 0

[  FAILED  ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingZeInitDriversWithTypesUnsupportedWithFailureThenSupportedTypesThenSuccessReturned (0 ms)
[ RUN      ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingZeInitDriversWithGPUTypeThenExpectPassWithGPUorAllOnly
/home/ubuntu/level-zero/test/loader_api.cpp:77: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInitDrivers(&pCount, nullptr, &desc)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:78: Failure
Expected: (pCount) > (0), actual: 0 vs 0

/home/ubuntu/level-zero/test/loader_api.cpp:81: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInitDrivers(&pCount, nullptr, &desc)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:82: Failure
Expected: (pCount) > (0), actual: 0 vs 0

/home/ubuntu/level-zero/test/loader_api.cpp:85: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInitDrivers(&pCount, nullptr, &desc)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:86: Failure
Expected: (pCount) > (0), actual: 0 vs 0

[  FAILED  ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingZeInitDriversWithGPUTypeThenExpectPassWithGPUorAllOnly (0 ms)
[ RUN      ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingZeInitDriversWithNPUTypeThenExpectPassWithNPUorAllOnly
/home/ubuntu/level-zero/test/loader_api.cpp:98: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInitDrivers(&pCount, nullptr, &desc)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:99: Failure
Expected: (pCount) > (0), actual: 0 vs 0

/home/ubuntu/level-zero/test/loader_api.cpp:102: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInitDrivers(&pCount, nullptr, &desc)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:103: Failure
Expected: (pCount) > (0), actual: 0 vs 0

/home/ubuntu/level-zero/test/loader_api.cpp:106: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInitDrivers(&pCount, nullptr, &desc)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:107: Failure
Expected: (pCount) > (0), actual: 0 vs 0

[  FAILED  ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingZeInitDriversWithNPUTypeThenExpectPassWithNPUorAllOnly (0 ms)
[ RUN      ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingZeInitDriversWithAnyTypeWithNullDriverAcceptingAllThenExpectatLeast1Driver
/home/ubuntu/level-zero/test/loader_api.cpp:119: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInitDrivers(&pCount, nullptr, &desc)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:120: Failure
Expected: (pCount) > (0), actual: 0 vs 0

/home/ubuntu/level-zero/test/loader_api.cpp:123: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInitDrivers(&pCount, nullptr, &desc)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:124: Failure
Expected: (pCount) > (0), actual: 0 vs 0

/home/ubuntu/level-zero/test/loader_api.cpp:127: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInitDrivers(&pCount, nullptr, &desc)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:128: Failure
Expected: (pCount) > (0), actual: 0 vs 0

/home/ubuntu/level-zero/test/loader_api.cpp:131: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInitDrivers(&pCount, nullptr, &desc)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:132: Failure
Expected: (pCount) > (0), actual: 0 vs 0

[  FAILED  ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingZeInitDriversWithAnyTypeWithNullDriverAcceptingAllThenExpectatLeast1Driver (0 ms)
[ RUN      ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingZeInitDriversThenzeInitThenBothCallsSucceedWithAllTypes
/home/ubuntu/level-zero/test/loader_api.cpp:145: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInitDrivers(&pInitDriversCount, nullptr, &desc)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:146: Failure
Expected: (pInitDriversCount) > (0), actual: 0 vs 0

/home/ubuntu/level-zero/test/loader_api.cpp:147: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInit(0)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:149: Failure
Expected: (pDriverGetCount) > (0), actual: 0 vs 0

[  FAILED  ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingZeInitDriversThenzeInitThenBothCallsSucceedWithAllTypes (0 ms)
[ RUN      ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingZeInitDriversThenzeInitThenBothCallsSucceedWithGPUTypes
/home/ubuntu/level-zero/test/loader_api.cpp:162: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInitDrivers(&pInitDriversCount, nullptr, &desc)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:163: Failure
Expected: (pInitDriversCount) > (0), actual: 0 vs 0

/home/ubuntu/level-zero/test/loader_api.cpp:164: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInit(ZE_INIT_FLAG_GPU_ONLY)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:166: Failure
Expected: (pDriverGetCount) > (0), actual: 0 vs 0

[  FAILED  ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingZeInitDriversThenzeInitThenBothCallsSucceedWithGPUTypes (0 ms)
[ RUN      ] LoaderInit.GivenZeInitDriversUnsupportedOnTheDriverWhenCallingZeInitDriversThenUninitializedReturned
/home/ubuntu/level-zero/test/loader_api.cpp:181: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInit(0)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:183: Failure
Expected: (pDriverGetCount) > (0), actual: 0 vs 0

[  FAILED  ] LoaderInit.GivenZeInitDriversUnsupportedOnTheDriverWhenCallingZeInitDriversThenUninitializedReturned (0 ms)
[ RUN      ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingZeInitDriversThenzeInitThenBothCallsSucceedWithNPUTypes
/home/ubuntu/level-zero/test/loader_api.cpp:196: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInitDrivers(&pInitDriversCount, nullptr, &desc)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:197: Failure
Expected: (pInitDriversCount) > (0), actual: 0 vs 0

/home/ubuntu/level-zero/test/loader_api.cpp:198: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInit(ZE_INIT_FLAG_VPU_ONLY)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:200: Failure
Expected: (pDriverGetCount) > (0), actual: 0 vs 0

[  FAILED  ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingZeInitDriversThenzeInitThenBothCallsSucceedWithNPUTypes (0 ms)
[ RUN      ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingzeInitThenZeInitDriversThenBothCallsSucceedWithAllTypes
/home/ubuntu/level-zero/test/loader_api.cpp:213: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInit(0)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:214: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInitDrivers(&pInitDriversCount, nullptr, &desc)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:215: Failure
Expected: (pInitDriversCount) > (0), actual: 0 vs 0

/home/ubuntu/level-zero/test/loader_api.cpp:217: Failure
Expected: (pDriverGetCount) > (0), actual: 0 vs 0

[  FAILED  ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingzeInitThenZeInitDriversThenBothCallsSucceedWithAllTypes (0 ms)
[ RUN      ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingzeInitThenZeInitDriversThenBothCallsSucceedWithGPUTypes
/home/ubuntu/level-zero/test/loader_api.cpp:230: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInit(ZE_INIT_FLAG_GPU_ONLY)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:231: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInitDrivers(&pInitDriversCount, nullptr, &desc)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:232: Failure
Expected: (pInitDriversCount) > (0), actual: 0 vs 0

/home/ubuntu/level-zero/test/loader_api.cpp:234: Failure
Expected: (pDriverGetCount) > (0), actual: 0 vs 0

[  FAILED  ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingzeInitThenZeInitDriversThenBothCallsSucceedWithGPUTypes (0 ms)
[ RUN      ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingzeInitThenZeInitDriversThenBothCallsSucceedWithNPUTypes
/home/ubuntu/level-zero/test/loader_api.cpp:247: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInit(ZE_INIT_FLAG_VPU_ONLY)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:248: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInitDrivers(&pInitDriversCount, nullptr, &desc)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:249: Failure
Expected: (pInitDriversCount) > (0), actual: 0 vs 0

/home/ubuntu/level-zero/test/loader_api.cpp:251: Failure
Expected: (pDriverGetCount) > (0), actual: 0 vs 0

[  FAILED  ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingzeInitThenZeInitDriversThenBothCallsSucceedWithNPUTypes (0 ms)
[----------] 11 tests from LoaderInit (0 ms total)

[----------] 3 tests from LoaderValidation
[ RUN      ] LoaderValidation.GivenLevelZeroLoaderPresentWhenCallingzeCommandListAppendMemoryCopyWithCircularDependencyOnEventsThenValidationLayerPrintsWarningOfDeadlock
/home/ubuntu/level-zero/test/loader_validation_layer.cpp:28: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInit(ZE_INIT_FLAG_GPU_ONLY)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_validation_layer.cpp:29: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInitDrivers(&pCount, nullptr, &desc)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_validation_layer.cpp:30: Failure
Expected: (pCount) > (0), actual: 0 vs 0

/home/ubuntu/level-zero/test/loader_validation_layer.cpp:72: Failure
Expected: (pDevice) != (nullptr), actual: NULL vs (nullptr)

Segmentation fault (core dumped)

mckees avatar Feb 06 '25 06:02 mckees

Alright, I did 2 things to improve my situation:

  1. Installed the NPU UMD, which provided /usr/lib/x86_64-linux-gnu/libze_intel_vpu.so.1
  2. Ran the tests as root. This was only obviously necessary once strace said it was failing to access /usr/lib/x86_64-linux-gnu/libze_intel_vpu.so.1

After that, the tests seemed to be getting stuck, so I did some debugging and found that it was hanging on a call to zeCommandQueueSynchronize. I made the following change to get past that by reducing the timeout:

diff --git a/test/loader_validation_layer.cpp b/test/loader_validation_layer.cpp
index d04f795..2138d90 100644
--- a/test/loader_validation_layer.cpp
+++ b/test/loader_validation_layer.cpp
@@ -168,26 +180,36 @@ TEST(
     status = zeCommandQueueCreate(context, pDevice, &command_queue_description, &command_queue);
     EXPECT_EQ(ZE_RESULT_SUCCESS, status);
 
+    std::cout << "lvl 6" << std::endl;
+
     status = zeCommandQueueExecuteCommandLists(command_queue, 1, &command_list, nullptr);
     EXPECT_EQ(ZE_RESULT_SUCCESS, status);
+    std::cout << "lvl 6.1" << std::endl;
 
-    status = zeCommandQueueSynchronize(command_queue, UINT64_MAX);
+    status = zeCommandQueueSynchronize(command_queue, 10000000000);
     EXPECT_EQ(ZE_RESULT_SUCCESS, status);

From there, I was able to get through all the tests on i9-12900.

However, on Core Ultra 7 268V, I'm still getting a test hang here in ze_libapi.cpp during the test LoaderValidation.GivenLevelZeroLoaderPresentWhenCallingzeCommandListAppendMemoryCopyWithCircularDependencyOnEventsThenValidationLayerPrintsWarningOfDeadlock:

ze_result_t ZE_APICALL
zeMemFree(
    ze_context_handle_t hContext,                   ///< [in] handle of the context object
    void* ptr                                       ///< [in][release] pointer to memory to free
    )
{
    if(ze_lib::context->inTeardown) {
        return ZE_RESULT_ERROR_UNINITIALIZED;
    }

    auto pfnFree = ze_lib::context->zeDdiTable.load()->Mem.pfnFree;
    if( nullptr == pfnFree ) {
        if(!ze_lib::context->isInitialized)
            return ZE_RESULT_ERROR_UNINITIALIZED;
        else
            return ZE_RESULT_ERROR_UNSUPPORTED_FEATURE;
    }

    return pfnFree( hContext, ptr ); /////////////// GETS STUCK ///////////////////
}

mckees avatar Feb 07 '25 01:02 mckees