dpctl dpt.pow() with dtype=`c16` and scalar on gpu/cpu returns different result

The below example works differently depending on the device:

import dpctl.tensor as dpt

a = dpt.asarray([0], dtype='c16', device='gpu')
dpt.pow(a,1)
# usm_ndarray([0.+0.j])

a = dpt.asarray([0], dtype='c16', device='cpu')
dpt.pow(a,1)
# usm_ndarray([nan+nanj])

with dtype = 'c8' returns the same result for different devices

import dpctl.tensor as dpt

a = dpt.asarray([0], dtype='c8',device='gpu')
dpt.pow(a ,1)
# usm_ndarray([0.+0.j], dtype=complex64)

a = dpt.asarray([0], dtype='c8',device='cpu')
dpt.pow(a,1)
# usm_ndarray([0.+0.j], dtype=complex64)

I also noticed that dpt.pow works correctly when the input array size is between 2 and 7 for dtype c16.

import dpctl.tensor as dpt


a = dpt.zeros((2,), dtype='c16', device='cpu')
dpt.pow(a,1)
# usm_ndarray([0.+0.j, 0.+0.j])

a = dpt.zeros((7,), dtype='c16', device='cpu')
dpt.pow(a,1)
# usm_ndarray([0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j])


a = dpt.zeros((8,), dtype='c16', device='cpu')
dpt.pow(a,1)
# usm_ndarray([ 0. +0.j,  0. +0.j,  0. +0.j,  0. +0.j, nan+nanj, nan+nanj,
             nan+nanj, nan+nanj])

Besides this there is an interesting case when x2 (scalar) is numpy dtype Then dpt.pow with input array with data type c8 returns nans too

import dpctl.tensor as dpt
import numpy

a = dpt.zeros((8,), dtype='c8', device='cpu')
dpt.pow(a, 1)
# usm_ndarray([0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j,
             0.+0.j], dtype=complex64)

dpt.pow(a, numpy.int32(1))
# usm_ndarray([ 0. +0.j,  0. +0.j,  0. +0.j,  0. +0.j, nan+nanj, nan+nanj,
             nan+nanj, nan+nanj])

Aug 28 '23 12:08 vlad-perevezentsev

@vlad-perevezentsev These discrepancies seem to have been resolved recently.

In [1]: import dpctl.tensor as dpt, numpy as np

In [2]: a = dpt.asarray([0], dtype='c16', device='cpu')

In [3]: dpt.pow(a,1)
Out[3]: usm_ndarray([0.+0.j])

In [4]: a = dpt.zeros((2,), dtype='c16', device='cpu')

In [5]: dpt.pow(a,1)
Out[5]: usm_ndarray([0.+0.j, 0.+0.j])

In [6]: a = dpt.zeros((8,), dtype='c16', device='cpu')

In [7]: dpt.pow(a,1)
Out[7]:
usm_ndarray([0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j,
             0.+0.j])

For the Numpy dtype case:

In [1]: import dpctl.tensor as dpt, numpy as np

In [2]: a = dpt.zeros((8,), dtype='c8', device='cpu')

In [3]: dpt.pow(a, np.int32(1))
Out[3]:
usm_ndarray([0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j,
             0.+0.j])

It's hard to know if it was a result of #1411 or the change in compiler version.

Either way, if you can confirm that these issues are resolved for you as well, we can consider this issue resolved.

Nov 23 '23 22:11 ndgrigorian

@vlad-perevezentsev I think this issue is ready to be resolved

Jan 24 '24 14:01 oleksandr-pavlyk