GenericProperty (and therefore Expando) changes str values to bytes (sometimes)
Expected Behavior
If I set the value of a GenericProperty to a str, then read it back from datastore, the value should still always be a str. The value I read back should equal the value I put in.
Actual Behavior
If I set a GenericProperty to the string 'oops', then read it back from datastore, the value is now b'oops', so it is not equal to the value I put in the property.
If I set a GenericProperty to the string 'oøps', then read it back from datastore, the value is still 'oøps'. Apparently we are deciding to do different behavior depending on the contents of the string.
The same thing happens in appengine Python 2 incidentally, but it was not a big problem in practice because in Python 2, 'oops' == u'oops' was True. In Python 3 it is a problem :)
Steps to Reproduce the Problem
class Oops(ndb.Model):
generic_prop = ndb.GenericProperty()
string_prop = ndb.StringProperty()
oops = Oops(generic_prop="oøps", string_prop="oøps")
assert "oøps" == oops.generic_prop == oops.string_prop
oops.put()
assert "oøps" == oops.generic_prop == oops.string_prop
oops = oops.key.get()
assert "oøps" == oops.generic_prop == oops.string_prop
oops = oops.key.get(use_cache=False)
assert oops.generic_prop == oops.string_prop # <- Succeeds: both properties are still "oøps"
oops = Oops(generic_prop="oops", string_prop="oops")
assert "oops" == oops.generic_prop == oops.string_prop
oops.put()
assert "oops" == oops.generic_prop == oops.string_prop
oops = oops.key.get()
assert "oops" == oops.generic_prop == oops.string_prop
oops = oops.key.get(use_cache=False)
assert oops.generic_prop == oops.string_prop # <- Fails, because now generic_prop is b'oops'
Specifications
- Version: appengine-python-standard 1.1.3, Python 3.11.4
- Platform: MacOS
I think changing one line here, from sval.decode('ascii') to sval = str(sval.decode('ascii')), ~~should fix the issue without causing problems for anyone who isn't doing something really strange?~~ edit: this would break indexed GenericProperties, see linked PR for a fix for this at least with unindexed ones.
https://github.com/GoogleCloudPlatform/appengine-python-standard/blob/a900a13130eb71d6f2dd67ca2d1573c7559053d9/src/google/appengine/ext/ndb/model.py#L2743-L2750
Edit: I guess this would break if you are putting bytes in and expecting bytes out, but for that case, GenericProperty will still fail for any bytes object that isn't an ascii-encoded string anyway, for example if you are putting binary data into it (but this would also fail in Python 2). Since binary data was never (edit: "properly") supported, it shouldn't be a compatibility issue to not support bytes here in Python 3 either and just assume the user wanted a str encoded as ascii or utf-8, which was assumed before also. (edit: the linked PR only assumes you wanted a str if you passed in a str, and doesn't change/fix any behavior if you explicitly passed bytes)
GenericProperty is also used internally for Expando though, so I don't know if a simple change like this would mess anything up with that. It's surprising to me that this issue hasn't been reported here or in the Python NDB library where it's also like this, maybe people don't use these two features very much.
Same issue exists with Expando:
class OopsExpando(ndb.Expando):
orig_prop = ndb.StringProperty()
value_to_put = 'oops'
oops = OopsExpando(orig_prop=value_to_put, added_prop=value_to_put)
oops.put()
assert oops.orig_prop == oops.added_prop # <- succeeds
oops = oops.key.get(use_cache=False)
assert oops.orig_prop == oops.added_prop # <- fails, but works with "oøps"
Just a note: I noticed the existing tests for this only test with bytes objects containing encoded ascii strings. I doubt this is the most common use of string properties in Expando, and I almost wonder if the tests were simply modified in Python 3 this way to make them pass; please let me know if there's a better reason I'm not thinking of. Also, it will break if you happen to pass bytes objects containing encoded utf-8 strings - I didn't attempt to fix this, since it was most likely also broken in Python 2, and also probably not a common use case.