asdf icon indicating copy to clipboard operation
asdf copied to clipboard

tuple keys not handled

Open bblais opened this issue 7 years ago • 6 comments

Smallest program which reproduces the error (asdf from conda 2.3.1):

import asdf

mydata={'hello':'there',
        (6,5):'bad key',
        'that':6,
       }
# Make the tree structure, and create a AsdfFile from it.
tree = {'hello':'there',
        'data':mydata,
       }
ff = asdf.AsdfFile(tree)  # saves, but error on load below
ff.write_to("test.asdf")

ff = asdf.AsdfFile(mydata)  # gives TypeError: unhashable type: 'list'
ff.write_to("test2.asdf")

with asdf.open('test.asdf') as af: # gives ConstructorError : while constructing a mapping
    print(af.tree)

Is this a known issue? thanks!

bblais avatar Feb 01 '19 15:02 bblais

Hi @bblais, thanks for the bug report. I am able to reproduce this and will investigate further.

This looks like it may actually be a limitation of the YAML implementation we use, but I need to dig deeper.

drdavella avatar Feb 01 '19 16:02 drdavella

This does in fact appear to be a limitation of pyyaml. Consider the following example:

import yaml

mydata = { 'hello':'there', (6,5):'bad key', 'that':6 }
s = yaml.safe_dump(mydata)
yaml.safe_load(s)

The result is the same error as in your example:

~/miniconda3/envs/asdfdev/lib/python3.6/site-packages/yaml/constructor.py in construct_mapping(self, node, deep)
    126             if not isinstance(key, collections.Hashable):
    127                 raise ConstructorError("while constructing a mapping", node.start_mark,
--> 128                         "found unhashable key", key_node.start_mark)
    129             value = self.construct_object(value_node, deep=deep)
    130             mapping[key] = value

ConstructorError: while constructing a mapping
  in "<unicode string>", line 1, column 1:
    hello: there
    ^
found unhashable key
  in "<unicode string>", line 2, column 3:
    ? [6, 5]
      ^

There doesn't seem to be anything in the YAML spec that would prevent it from being used this way, so this may actually be a bug in the pyyaml implementation.

drdavella avatar Feb 01 '19 16:02 drdavella

It's worth noting that ruamel.yaml appears to handle this case correctly. We have an issue for exploring the use of that package vs pyyaml (#391), so this would be a motivating use case.

drdavella avatar Feb 01 '19 17:02 drdavella

Generally speaking, aren't keys supposed by hashable? Certainly python tuples are hashable because they are immutable, so it is expected that on the way out (via writing) this works. But on read, it seems there is no way to know whether an array in a YAML file will be used as a list (mutable) or tuple (immutable), so no way of knowning if it is hashable or not from the YAML.

jdavies-st avatar Feb 01 '19 18:02 jdavies-st

@jdavies-st the YAML standard has no concept of what is hashable and what is not. And the standard seems to say that anything that is representable in unicode can be used as a key in a mapping, including other YAML objects. So there's no intrinsic issue with doing this as far as YAML itself is concerned.

What ruamel.yaml appears to do is to store the tuple as a YAML sequence (i.e. list) in order to use it as a key. When loading the file, it converts the sequence key back into a tuple since a Python list would not have been hashable in this way. This seems pretty reasonable to me. The fact that pyyaml does not do this seems to be a bug.

drdavella avatar Feb 01 '19 18:02 drdavella

Just tested this using the new release of pyyaml (version 5.1) and this still appears to be broken.

drdavella avatar Mar 18 '19 14:03 drdavella