Fixes cudaErrorInvalidValue when running on nvbench-created cuda stream
This PR fixes a minor issue that may occur when nvbench is run on multiple GPUs without a user-provided cuda stream.
The issue
The error that I observed in this case looked like:
Fail: Unexpected error: nvbench/detail/l2flush.cuh:55: Cuda API call returned error: cudaErrorInvalidValue: invalid argument
When run with memcheck I would see:
Program hit cudaErrorInvalidValue (error 1) due to "invalid argument" on CUDA API call to cudaMemsetAsync.
The Problem
It seems that nvbench is creating all the nvbench-owned streams on device 0.
Suggested Fix
This fix makes sure that the streams are created on the device on which they are later on used.
This LGTM, thanks for catching it! Some of the tests don't build after the changes, you can run
ci/local/build.bashfrom thenvbenchroot to build and test if you have docker setup.Once tests are passing this is good to go.
Thanks for reviewing the PR. nvbench::cuda_stream used to be default constructible and also be part of the public API.
In this PR, I required passing a std::optional<nvbench::device_info> to cuda_stream's ctor, which sort of was a breaking change. To avoid the breaking change, I've now added back the default ctor to cuda_stream.
@elstehle I'm still seeing a test regression when running ci/local/build.bash on this branch:
4/39 Test #32: nvbench.test.state_generator ..................***Failed 2.39 sec
/cccl/nvbench/nvbench/detail/device_scope.cuh:37: Cuda API call returned error: cudaErrorInvalidDevice: invalid device ordinal
Command: 'cudaSetDevice(dev_id)'
@elstehle I'm still seeing a test regression when running
ci/local/build.bashon this branch:4/39 Test #32: nvbench.test.state_generator ..................***Failed 2.39 sec /cccl/nvbench/nvbench/detail/device_scope.cuh:37: Cuda API call returned error: cudaErrorInvalidDevice: invalid device ordinal Command: 'cudaSetDevice(dev_id)'
Thanks! Sorry, I've had missed that regression as it only occurred on systems with three devices or less.
Issue with the test in testing/state_generator.cu was that we generate states for devices [0, 1, 2], independent of whether those devices existed or not:
const auto device_0 = nvbench::device_info{0, {}};
const auto device_1 = nvbench::device_info{1, {}};
const auto device_2 = nvbench::device_info{2, {}};
dummy_bench bench;
bench.set_devices({device_0, device_1, device_2});
...
const std::vector<nvbench::state> states = nvbench::detail::state_generator::create(bench);
When the states are created, we create the stream for each state on that state's given device. If a given device doesn't exist, we run into a cuda error.
For comparison, if we'd currently run a benchmark with invalid device ids, the runner would fail with the same error.
../nvbench/device_info.cuh:71: Cuda API call returned error: cudaErrorInvalidDevice: invalid device ordinal
I resolved this regression by adjusting the test in testing/state_generator.cu to only run on devices actually available in the system. But I would like to confirm that we're generally ok with that behaviour.