Incorrect device numbers in target callbacks and records.
I added the following to this repo's OMPT example:
int initial_device = omp_get_initial_device();
printf("initial device_num = %d\n", initial_device);
#pragma omp target data map(to: i) device(3)
{
#pragma omp target device(3)
{
int device_num = omp_get_device_num();
printf("device_num = %d\n", device_num);
for (int j = 0; j< N; j++)
a[j]=b[j];
}
}
The corresponding output I see is:
initial device_num = 4
Callback Target: target_id=1 kind=2 endpoint=1 device_num=3 code=0x2100e5
Init: device_num=3 type=AMD gfx908 device=0x714983 lookup=0x7f59195a1280 doc=(nil)
Load: device_num:3 filename:(null) host_adddr:0x201640 device_addr:(nil) bytes:52600
Callback DataOp: target_id=1 host_op_id=2 optype=1 src=0x7ffc7b80579c src_device_num=3 dest=(nil) dest_device_num=0 bytes=4 code=0x7f59232d8500
Allocated 256 bytes at 0x940120 in buffer request callback
Callback DataOp: target_id=1 host_op_id=3 optype=2 src=0x7ffc7b80579c src_device_num=0 dest=0x7f5912400000 dest_device_num=3 bytes=4 code=0x7f59232d83cd
Allocated 256 bytes at 0x93fa70 in buffer request callback
Callback Target: target_id=1 kind=2 endpoint=2 device_num=3 code=0x2100e5
Callback Target: target_id=4 kind=1 endpoint=1 device_num=3 code=0x2101ae
Callback DataOp: target_id=4 host_op_id=5 optype=1 src=0x7ffc7b867530 src_device_num=3 dest=(nil) dest_device_num=0 bytes=400000 code=0x7f59232d8500
Allocated 256 bytes at 0x940bd0 in buffer request callback
Callback DataOp: target_id=4 host_op_id=6 optype=2 src=0x7ffc7b867530 src_device_num=0 dest=0x7f5912401000 dest_device_num=3 bytes=400000 code=0x7f59232d83cd
Executing buffer complete callback: 0 0x940120 208 0x940120 0
rec=0x940120 type=9 time=1651265829237242163 thread_id=0 target_id=1
Record DataOp: host_op_id=2 optype=1 src_addr=0x7f5912400000 src_device=3 dest_addr=0x7ffc7b80579c dest_device=3 bytes=4 end_time=1651265829237293543 duration=51380 ns codeptr=0x7f59232d8500
rec=0x940188 type=9 time=1651265829237295293 thread_id=0 target_id=1
Record DataOp: host_op_id=3 optype=2 src_addr=0x7ffc7b80579c src_device=3 dest_addr=0x7f5912400000 dest_device=3 bytes=4 end_time=1651265829238104440 duration=809147 ns codeptr=0x7f59232d83cd
Executing buffer complete callback: 0 0x940120 0 (nil) 1
Deallocated 0x940120
I've done the same thing specifying device({0,1,2}) instead of device(3) and I see a similar pattern: data is always transferred between device number 0 and device(x).
So it seems the openmp runtime is handing 0 as the device number in the ompt_callback_target_data_op_emi callback both for the initial device num (the number of devices, according to the openmp spec), and the true device number 0.
A perhaps separate issue is that, (at least) for Record DataOp records, src_addr==dest_addr==(non-initial device number).
Fixed with this commit https://github.com/RadeonOpenCompute/llvm-project/commit/2d13fd548e385fcd6e3fe85b6261c10428ef6029
This should be fixed for some time. @ZealotTKO can you confirm and close the issue? Thanks.