riptable
riptable copied to clipboard
FA.max() returns wrong (and random) value if the array contains nan
When nan's are present, fa.max() returns wrong values.
On small arrays, it seems to return the max of the values after the last nan:
Python 3.8.10 (default, Jun 4 2021, 15:09:15)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import riptable as rt
>>> import numpy as np
>>> rt.__version__
'1.0.54'
>>> np.__version__
'1.20.2'
>>> rt.FA([100., np.nan, 1.]).max()
1.0
On large arrays the return value is not always the same:
>>> n = int(1e8)
>>> fa = rt.FA(np.random.randn(n))
>>> fa[np.random.randint(n, size=n//10)] = np.nan
>>> set(fa.max() for _ in range(50))
{nan, nan, 2.4668546005014775, 3.7485398782405173, 3.9668486059257404, 3.399106783252142, 3.5371350208015797, 3.6332312696800657, 3.526567411431958, 3.4674234137543913, 3.514753497378607, 2.9181688136829638, 3.9906758327369833, 4.239881004227352, nan, nan, 3.396676484732101, 3.313535283993719, 4.112033674336235, nan, nan, nan, 3.498997074434874, 2.3311797068279683, 3.661987462561398, 4.579983239860476, 3.839393848492756, 3.4611311536397635}
It appears that the python layer calls in with the REDUCE_MAX function number (202). regardless of the data set in use. Thus the NaN detection logic is not in play, and trying to order a NaN is not a good thing to have happen.