riptable icon indicating copy to clipboard operation
riptable copied to clipboard

Dataset cannot be dict_keys indexed

Open 972d5defe3218bd62b741e6a2f11f5b3 opened this issue 5 years ago • 2 comments

I am trying to index dataset columns using keys of a dictionary.

The following exception is raised in function _mask_get_item rt_struct.py (line 607) TypeError: Error in column slice; Dataset cannot be dict_keys indexed.

I propose to simply convert dict_keys to list in _mask_get_item. Would that be acceptable to the maintainers?

Thank you for reporting this.

Your suggestion for a fix seems reasonable, but I think with a small bit of extra work it could be made more general and powerful. What about this fix:

  1. The code in Struct._mask_get_item() which checks for/handles the list case can be modified to check if the index value was an instance of Sequence (from either the typing or collections.abc modules).
  2. The change in (1) will break things on it's own, because it'll catch the (str, bytes) and tuple cases further down (the tuple case is currently implicit, in the else). So the (str, bytes) case needs to be moved above the list/Sequence case, and the logic from the else needs to be copied to a new, explicit check (isinstance(idx, tuple)), also above the list/Sequence case so it continues working correctly.
  3. Below the list/Sequence handler, add a check/handler isinstance(idx, Collection) (where Collection is from either typing or collections.abc). That handler could just convert idx to a list then make a self-recursive call, at which point the list/Sequence handler will catch it and do what you want.

The Python dict_keys type implements Collection, so you'll end up with the behavior you want. However, it'll also mean that you can index into a Dataset with a set of column names or other collection type, though the column ordering of the resulting Dataset will only be deterministic if the collection internally maintains some ordering of the elements.

If you submit a PR to fix this, please also include at least one unit test demonstrating the new behavior (e.g. indexing into a Dataset with an instance of dict_keys).

jack-pappas avatar Apr 19 '21 17:04 jack-pappas

I like it. How about we go even further? Consider instead changing isinstance(idx,list) to isinstance(idx,Iterable), thus allowing generators as well. Then, just add idx = list(idx) at the top of the elif branch. Note that doing so also allows column tuples to pass through if we write the more specific isinstance(idx,tuple) and not by_col_arg.