riptable icon indicating copy to clipboard operation
riptable copied to clipboard

FA.max() returns wrong (and random) value if the array contains nan

Open MarcMassar opened this issue 4 years ago • 1 comments

When nan's are present, fa.max() returns wrong values.

On small arrays, it seems to return the max of the values after the last nan:

Python 3.8.10 (default, Jun  4 2021, 15:09:15)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import riptable as rt
>>> import numpy as np
>>> rt.__version__
'1.0.54'
>>> np.__version__
'1.20.2'
>>> rt.FA([100., np.nan, 1.]).max()
1.0

On large arrays the return value is not always the same:

>>> n = int(1e8)
>>> fa = rt.FA(np.random.randn(n))
>>> fa[np.random.randint(n, size=n//10)] = np.nan
>>> set(fa.max() for _ in range(50))
{nan, nan, 2.4668546005014775, 3.7485398782405173, 3.9668486059257404, 3.399106783252142, 3.5371350208015797, 3.6332312696800657, 3.526567411431958, 3.4674234137543913, 3.514753497378607, 2.9181688136829638, 3.9906758327369833, 4.239881004227352, nan, nan, 3.396676484732101, 3.313535283993719, 4.112033674336235, nan, nan, nan, 3.498997074434874, 2.3311797068279683, 3.661987462561398, 4.579983239860476, 3.839393848492756, 3.4611311536397635}

MarcMassar avatar Jun 23 '21 18:06 MarcMassar

It appears that the python layer calls in with the REDUCE_MAX function number (202). regardless of the data set in use. Thus the NaN detection logic is not in play, and trying to order a NaN is not a good thing to have happen.

staffantj avatar Mar 01 '22 20:03 staffantj