David Clark comments

Results 35 comments of


                                            David Clark

Large difference between OpenCL and other platform during test

Which NVIDIA driver version is installed? Do you know if it reproduced across multiple versions?

Large difference between OpenCL and other platform during test

I believe this is an issue with driver 495.29.05—I reproduced the issue with OpenMM 7.6.0 and 7.7.0 with that driver version. However, updating the driver to a later 495 release...

Multi-GPU simulation shows slow-down than 1-GPU simulation

Do you know how the GPUs are connected? Running `nvidia-smi topo -m` should print the topology. I ran the amber20-cellulose (408,609 atoms) benchmark via the benchmark.py script in the examples...

Multi-GPU simulation shows slow-down than 1-GPU simulation

Thanks! While they are not connected via NVLink, it looks like the GPUs still have P2P access. The node I tested on had a similar topology. Would it be possible...

Multi-GPU simulation shows slow-down than 1-GPU simulation

That would be my guess—this is beyond my area of expertise. @peastman do you have any ideas?

Improve load balancing on multiple GPUs

The time resolution of `cuEventElapsedTime` should be around 0.5 microseconds. The non-NULL stream behavior does seem potentially problematic. I will look into other solutions!

Improve load balancing on multiple GPUs

Sorry, I lost track of this—I think we might be able to use CUDA events for this use-case. It looks like CUPTI could also be a possibility, but I am...

Improve load balancing on multiple GPUs

I think CUDA events will produce reasonable accurate results. While there are technically multiple streams per device, I don't think they will be an issue given what we would like...

Improve load balancing on multiple GPUs

I think it needs a `cuStreamSynchronize` call near line 266—I believe it may be calling `cuEventElapsedTime` before one of the events has been completed

Improve load balancing on multiple GPUs

I believe those are non-blocking on the host side—it is making the stream wait on the given event