Dataset cannot be dict_keys indexed
I am trying to index dataset columns using keys of a dictionary.
The following exception is raised in function _mask_get_item rt_struct.py (line 607)
TypeError: Error in column slice; Dataset cannot be dict_keys indexed.
I propose to simply convert dict_keys to list in _mask_get_item. Would that be acceptable to the maintainers?
Thank you for reporting this.
Your suggestion for a fix seems reasonable, but I think with a small bit of extra work it could be made more general and powerful. What about this fix:
- The code in
Struct._mask_get_item()which checks for/handles thelistcase can be modified to check if the index value was an instance ofSequence(from either thetypingorcollections.abcmodules). - The change in (1) will break things on it's own, because it'll catch the
(str, bytes)andtuplecases further down (thetuplecase is currently implicit, in theelse). So the(str, bytes)case needs to be moved above thelist/Sequencecase, and the logic from theelseneeds to be copied to a new, explicit check (isinstance(idx, tuple)), also above thelist/Sequencecase so it continues working correctly. -
Below the
list/Sequencehandler, add a check/handlerisinstance(idx, Collection)(whereCollectionis from eithertypingorcollections.abc). That handler could just convertidxto a list then make a self-recursive call, at which point thelist/Sequencehandler will catch it and do what you want.
The Python dict_keys type implements Collection, so you'll end up with the behavior you want. However, it'll also mean that you can index into a Dataset with a set of column names or other collection type, though the column ordering of the resulting Dataset will only be deterministic if the collection internally maintains some ordering of the elements.
If you submit a PR to fix this, please also include at least one unit test demonstrating the new behavior (e.g. indexing into a Dataset with an instance of dict_keys).
I like it. How about we go even further? Consider instead changing isinstance(idx,list) to isinstance(idx,Iterable), thus allowing generators as well. Then, just add idx = list(idx) at the top of the elif branch. Note that doing so also allows column tuples to pass through if we write the more specific isinstance(idx,tuple) and not by_col_arg.