Function to synchronize an entire device.
The existing implementation of device::finish() only synchronizes the current stream (e.g., calling cuStreamSynchronize), making both the function name and documentation somewhat misleading.
Some downstream OCCA applications require a mechanism to wait for all enqueued operations on a device to finish, similar to cudaDeviceSynchronize.
The programming models of the other backends (i.e., OpenCL, SYCL) don't have a similar API for device synchronization, however modeDevice_t already retains a vector of streams which have been allocated so this should not be an issue.
Two potential options to move forward with this are:
- Change the implementation of
device::finish()to match its name and documentation, then add a function to thestreamclass for synchronizing only a particular stream (and possibly a shortcut to synch the current stream). - Keep the current implementation of
device::finish(), but update its documentation and add another functiondevice::finishAll()which synchronizes all streams on a device.
After discussing this at the OCCA TAF meeting we will go with the second option, adding a new function finishAll() to the occa::device class.