riptable icon indicating copy to clipboard operation
riptable copied to clipboard

Dataset.putmask mangles the data if columns are not in the right order

Open MarcMassar opened this issue 4 years ago • 0 comments

Applying Dataset.putmask on a dataset with different columns, or with the same columns but in a different order, will silently produce garbage.

The issue happens because Dataset.putmask(mask, values) just validates that self.shape == values.shape, and then it operated on the columns by index and not by name.

Here is a realistic example (rt.merge_lookup puts the 'on' columns first, which causes the issue):

>>> import riptable as rt
>>> rt.__version__
'1.1.4'
>>> prices = rt.Dataset({'price':[100., 200.], 'stock_id':[1, 2]})
>>> updates = rt.Dataset({'price':[101., 201.], 'stock_id':[1, 1]})
>>> updates_aligned = rt.merge_lookup(prices, updates, on='stock_id', columns_left=[], keep='last')
>>> prices.putmask(updates_aligned.price.isfinite(), updates_aligned)
>>> prices
#    price   stock_id
-   ------   --------
0     1.00        201
1   200.00          2
[2 rows x 2 columns] total bytes: 32.0 B

MarcMassar avatar Dec 01 '21 19:12 MarcMassar