Memory Consistency for Memory Objects Created from SVM Allocations
Found while considering memory objects created from SVM allocations for the unified SVM extension:
What are the memory consistency expectations for memory objects created from SVM allocations?
Some specific questions to answer are:
- If a memory object is created from a coarse-grain SVM allocation:
- a. When are writes to the memory object visible through the coarse-grain SVM allocation on the device?
- b. When are writes to the coarse-grain SVM allocation visible through the memory object on the device?
- c. Is it safe to concurrently write to non-overlapping addresses through the memory object and the coarse-grain SVM allocation?
- d. Presumably any host access is reconciled when the memory object or SVM allocation is mapped?
- e. What is the expected behavior if the memory object is mapped for reading and the coarse-grain SVM allocation is used on the device? Does it matter if the device is only reading from the coarse-grain SVM allocation?
- f. What is the expected behavior if the memory object is mapped for writing and the coarse-grain SVM allocation is used on the device? Does it matter if the device is only reading from the coarse-grain SVM allocation?
- g. Confirm: It is undefined behavior to read from or write to the memory object on the device while the coarse-grain SVM allocation is mapped for writing (description of clEnqueueSVMUnmap).
- h. What is the expected behavior if both the memory object and the coarse-grain SVM allocation are mapped for writing? Is this an error? Is this undefined behavior?
- i. Confirm: It is undefined behavior to write to the memory object on the device while the coarse-grain SVM allocation is mapped for reading (description of clEnqueueSVMUnmap).
- If a memory object is created from a fine-grain SVM allocation:
- a. When are writes to the memory object visible through the fine-grain SVM allocation on the device?
- b. When are writes to the fine-grain SVM allocation visible through the memory object on the device?
- c. Is it safe to concurrently write to non-overlapping addresses through the memory object and the fine-grain SVM allocation?
- d. If the fine-grain SVM allocation supports SVM atomics, is it safe to concurrently write to the same memory addresses atomically through the memory object and the fine-grain SVM allocation? What if the fine-grain SVM allocation does not support SVM atomics?
- e. When are writes to the fine-grain SVM allocation on the host visible through the memory object on the device?
- f. What is the expected behavior if the memory object is mapped for reading and the fine-grain SVM allocation is used on the device? Does it matter if the device is only reading from the fine-grain SVM allocation?
- g. What is the expected behavior if the memory object is mapped for writing and the fine-grain SVM allocation is used on the device? Does it matter if the device is only reading from the fine-grain SVM allocation?
- h. What is the expected behavior if both the memory object and the fine-grain SVM allocation are mapped for writing? Is this an error? Is this undefined behavior?
- i. Confirm: It is OK to use the memory object on the device while the fine-grain SVM allocation is mapped for writing.
It's probably most intuitive to think of the "memory object" in the descriptions above as being a buffer memory object, but for added "fun", what if it is an image memory object?
We introduced this issue in the May 6th memory subgroup but we did not have any time to discuss in detail.
The key difference between the two scenarios is that a coarse-grain SVM allocation must be mapped to access it on the host, whereas no mapping is required for a fine-grain SVM allocation.
I started writing some tests to see whether implementations have address equivalence when a buffer memory object is created from an SVM allocation. Let's discuss next steps in our meeting today.
https://github.com/KhronosGroup/OpenCL-CTS/pull/2408