riptable
riptable copied to clipboard
Dataset.putmask mangles the data if columns are not in the right order
Applying Dataset.putmask on a dataset with different columns, or with the same columns but in a different order, will silently produce garbage.
The issue happens because Dataset.putmask(mask, values) just validates that self.shape == values.shape, and then it operated on the columns by index and not by name.
Here is a realistic example (rt.merge_lookup puts the 'on' columns first, which causes the issue):
>>> import riptable as rt
>>> rt.__version__
'1.1.4'
>>> prices = rt.Dataset({'price':[100., 200.], 'stock_id':[1, 2]})
>>> updates = rt.Dataset({'price':[101., 201.], 'stock_id':[1, 1]})
>>> updates_aligned = rt.merge_lookup(prices, updates, on='stock_id', columns_left=[], keep='last')
>>> prices.putmask(updates_aligned.price.isfinite(), updates_aligned)
>>> prices
# price stock_id
- ------ --------
0 1.00 201
1 200.00 2
[2 rows x 2 columns] total bytes: 32.0 B