nvbench icon indicating copy to clipboard operation
nvbench copied to clipboard

Fixes cudaErrorInvalidValue when running on nvbench-created cuda stream

Open elstehle opened this issue 3 years ago • 3 comments

This PR fixes a minor issue that may occur when nvbench is run on multiple GPUs without a user-provided cuda stream.

The issue

The error that I observed in this case looked like:

Fail: Unexpected error: nvbench/detail/l2flush.cuh:55: Cuda API call returned error: cudaErrorInvalidValue: invalid argument

When run with memcheck I would see:

Program hit cudaErrorInvalidValue (error 1) due to "invalid argument" on CUDA API call to cudaMemsetAsync.

The Problem

It seems that nvbench is creating all the nvbench-owned streams on device 0.

Suggested Fix

This fix makes sure that the streams are created on the device on which they are later on used.

elstehle avatar Dec 07 '22 12:12 elstehle

This LGTM, thanks for catching it! Some of the tests don't build after the changes, you can run ci/local/build.bash from the nvbench root to build and test if you have docker setup.

Once tests are passing this is good to go.

Thanks for reviewing the PR. nvbench::cuda_stream used to be default constructible and also be part of the public API. In this PR, I required passing a std::optional<nvbench::device_info> to cuda_stream's ctor, which sort of was a breaking change. To avoid the breaking change, I've now added back the default ctor to cuda_stream.

elstehle avatar Jan 18 '23 15:01 elstehle

@elstehle I'm still seeing a test regression when running ci/local/build.bash on this branch:

 4/39 Test #32: nvbench.test.state_generator ..................***Failed    2.39 sec
/cccl/nvbench/nvbench/detail/device_scope.cuh:37: Cuda API call returned error: cudaErrorInvalidDevice: invalid device ordinal
Command: 'cudaSetDevice(dev_id)'

alliepiper avatar Jan 30 '23 18:01 alliepiper

@elstehle I'm still seeing a test regression when running ci/local/build.bash on this branch:

 4/39 Test #32: nvbench.test.state_generator ..................***Failed    2.39 sec
/cccl/nvbench/nvbench/detail/device_scope.cuh:37: Cuda API call returned error: cudaErrorInvalidDevice: invalid device ordinal
Command: 'cudaSetDevice(dev_id)'

Thanks! Sorry, I've had missed that regression as it only occurred on systems with three devices or less.

Issue with the test in testing/state_generator.cu was that we generate states for devices [0, 1, 2], independent of whether those devices existed or not:

const auto device_0 = nvbench::device_info{0, {}};
const auto device_1 = nvbench::device_info{1, {}};
const auto device_2 = nvbench::device_info{2, {}};

dummy_bench bench;
bench.set_devices({device_0, device_1, device_2});
...
const std::vector<nvbench::state> states = nvbench::detail::state_generator::create(bench);

When the states are created, we create the stream for each state on that state's given device. If a given device doesn't exist, we run into a cuda error.

For comparison, if we'd currently run a benchmark with invalid device ids, the runner would fail with the same error.

../nvbench/device_info.cuh:71: Cuda API call returned error: cudaErrorInvalidDevice: invalid device ordinal

I resolved this regression by adjusting the test in testing/state_generator.cu to only run on devices actually available in the system. But I would like to confirm that we're generally ok with that behaviour.

elstehle avatar Jan 31 '23 17:01 elstehle