riptide_cpp icon indicating copy to clipboard operation
riptide_cpp copied to clipboard

AllocateLikeNumpyArray() may return arrays with unwanted baggage

Open OrestZborowski-SIG opened this issue 4 years ago • 1 comments

The AllocateLikeNumpyArray() function seems to intend to create a new array with the same shape as the template array, but it uses the PyArray_NewLikeArray API with subOK=true, which implies a 'new-from-template' mechanism (see docs). That mechanism will end up invoking the array_finalize() method, which will potentially copy attributes from the template into the new instance. This is great for foo.view() and not so great in this case, where we want a default-initialized instance.

Instead, this API should use the lower-level PyArray_NewFromDescr and only pass in the descriptor from the template, so that we get a default-initialized instance.

This was uncovered as part of investigation into rtosholdings/riptable#285

OrestZborowski-SIG avatar Jan 14 '22 20:01 OrestZborowski-SIG

This is a small program that demonstrates the issue

import riptable as rt
import riptide_cpp as rc

print("initializing")
arr1 = rt.FA([b'ABC'])
arr2 = rt.FA([123])
arrs = [arr1,arr2]

print("setting arr1 name")
arr1.set_name('NameABC')
print(f"arr1.name={arr1.get_name()}")

print("calling rc.MultiKeyGroupBy32")
iKey, iFirstKey, unique_count = rc.MultiKeyGroupBy32(arrs, 0, None, 2, None)
print(f"key.name={iKey.get_name()}")

It emits

initializing
setting arr1 name
arr1.name=NameABC
calling rc.MultiKeyGroupBy32
key.name=NameABC

so the resulting key array has the same name as the first input, which is used in the new-from-template.

In v1.1.0, the code emits

initializing
setting arr1 name
arr1.name=NameABC
calling rc.MultiKeyGroupBy32
key.name=None

OrestZborowski-SIG avatar Jan 14 '22 20:01 OrestZborowski-SIG