David Clark

Results 35 comments of David Clark

Which NVIDIA driver version is installed? Do you know if it reproduced across multiple versions?

I believe this is an issue with driver 495.29.05—I reproduced the issue with OpenMM 7.6.0 and 7.7.0 with that driver version. However, updating the driver to a later 495 release...

Do you know how the GPUs are connected? Running `nvidia-smi topo -m` should print the topology. I ran the amber20-cellulose (408,609 atoms) benchmark via the benchmark.py script in the examples...

Thanks! While they are not connected via NVLink, it looks like the GPUs still have P2P access. The node I tested on had a similar topology. Would it be possible...

That would be my guess—this is beyond my area of expertise. @peastman do you have any ideas?

The time resolution of `cuEventElapsedTime` should be around 0.5 microseconds. The non-NULL stream behavior does seem potentially problematic. I will look into other solutions!

Sorry, I lost track of this—I think we might be able to use CUDA events for this use-case. It looks like CUPTI could also be a possibility, but I am...

I think CUDA events will produce reasonable accurate results. While there are technically multiple streams per device, I don't think they will be an issue given what we would like...

I think it needs a `cuStreamSynchronize` call near line 266—I believe it may be calling `cuEventElapsedTime` before one of the events has been completed

I believe those are non-blocking on the host side—it is making the stream wait on the given event