aomp icon indicating copy to clipboard operation
aomp copied to clipboard

Incorrect device numbers in target callbacks and records.

Open ZealotTKO opened this issue 4 years ago • 2 comments

I added the following to this repo's OMPT example:

  int initial_device = omp_get_initial_device();
  printf("initial device_num = %d\n", initial_device);
#pragma omp target data map(to: i)  device(3)
  {
    #pragma omp target device(3)
    {
      int device_num = omp_get_device_num();
      printf("device_num = %d\n", device_num);
      for (int j = 0; j< N; j++)
        a[j]=b[j];
    }
  }

The corresponding output I see is:

initial device_num = 4
Callback Target: target_id=1 kind=2 endpoint=1 device_num=3 code=0x2100e5
Init: device_num=3 type=AMD gfx908 device=0x714983 lookup=0x7f59195a1280 doc=(nil)
Load: device_num:3 filename:(null) host_adddr:0x201640 device_addr:(nil) bytes:52600
  Callback DataOp: target_id=1 host_op_id=2 optype=1 src=0x7ffc7b80579c src_device_num=3 dest=(nil) dest_device_num=0 bytes=4 code=0x7f59232d8500
Allocated 256 bytes at 0x940120 in buffer request callback
  Callback DataOp: target_id=1 host_op_id=3 optype=2 src=0x7ffc7b80579c src_device_num=0 dest=0x7f5912400000 dest_device_num=3 bytes=4 code=0x7f59232d83cd
Allocated 256 bytes at 0x93fa70 in buffer request callback
Callback Target: target_id=1 kind=2 endpoint=2 device_num=3 code=0x2100e5
Callback Target: target_id=4 kind=1 endpoint=1 device_num=3 code=0x2101ae
  Callback DataOp: target_id=4 host_op_id=5 optype=1 src=0x7ffc7b867530 src_device_num=3 dest=(nil) dest_device_num=0 bytes=400000 code=0x7f59232d8500
Allocated 256 bytes at 0x940bd0 in buffer request callback
  Callback DataOp: target_id=4 host_op_id=6 optype=2 src=0x7ffc7b867530 src_device_num=0 dest=0x7f5912401000 dest_device_num=3 bytes=400000 code=0x7f59232d83cd
Executing buffer complete callback: 0 0x940120 208 0x940120 0
rec=0x940120 type=9 time=1651265829237242163 thread_id=0 target_id=1
	  Record DataOp: host_op_id=2 optype=1 src_addr=0x7f5912400000 src_device=3 dest_addr=0x7ffc7b80579c dest_device=3 bytes=4 end_time=1651265829237293543 duration=51380 ns codeptr=0x7f59232d8500
rec=0x940188 type=9 time=1651265829237295293 thread_id=0 target_id=1
	  Record DataOp: host_op_id=3 optype=2 src_addr=0x7ffc7b80579c src_device=3 dest_addr=0x7f5912400000 dest_device=3 bytes=4 end_time=1651265829238104440 duration=809147 ns codeptr=0x7f59232d83cd
Executing buffer complete callback: 0 0x940120 0 (nil) 1
Deallocated 0x940120

I've done the same thing specifying device({0,1,2}) instead of device(3) and I see a similar pattern: data is always transferred between device number 0 and device(x).

So it seems the openmp runtime is handing 0 as the device number in the ompt_callback_target_data_op_emi callback both for the initial device num (the number of devices, according to the openmp spec), and the true device number 0.

A perhaps separate issue is that, (at least) for Record DataOp records, src_addr==dest_addr==(non-initial device number).

ZealotTKO avatar Apr 29 '22 21:04 ZealotTKO

Fixed with this commit https://github.com/RadeonOpenCompute/llvm-project/commit/2d13fd548e385fcd6e3fe85b6261c10428ef6029

dhruvachak avatar May 31 '22 18:05 dhruvachak

This should be fixed for some time. @ZealotTKO can you confirm and close the issue? Thanks.

jplehr avatar Feb 21 '24 19:02 jplehr