psych icon indicating copy to clipboard operation
psych copied to clipboard

Can't deserialize a string that looks like a float properly

Open willbryant opened this issue 12 years ago • 7 comments

Under Syck if you register a domain type, the block will get the tag name and the value yielded to it as they are in the actual raw YAML.

In Psych however the value string has already been tokenized by the call at the bottom of Psych::Visitors::ToRuby#deserialize, before #accept is called.

This is a problem because if your serialized value string happens to look like a floating point number, then it will get parsed as a Float and your deserializer won't be able to get the original string so that it can parse it properly (without float's precision problems).

Is there another way to add domain class parsing in a way that won't get tokenized? It seems that at the moment we have to extend the big case statement in #deserialize (or use alias_method_chain to add a special case).

If not could we maybe rearrange slightly so that #tokenize is not called before #accept?

willbryant avatar Jun 05 '13 06:06 willbryant

I see. What would you like passed to the block? The scalar value, or the node? I think the node makes more sense, but it's up to you.

tenderlove avatar Nov 26 '13 22:11 tenderlove

Yeah, good question - personally I would only want the string (which IIRC is what we got from Syck), but I can see the case for returning the node.

If there's already other public APIs in Psych that expect callers to work with Node objects then that seems reasonable and is perhaps more future-proof? Otherwise I'd keep it simple and use strings.

willbryant avatar Nov 30 '13 02:11 willbryant

Ya, the problem with strings is that you could register a handler for a complex type like a hash. In that case, passing a string wouldn't be possible.

Aaron Patterson http://tenderlovemaking.com/ I'm on an iPhone so I apologize for top posting.

On Nov 29, 2013, at 6:02 PM, Will Bryant [email protected] wrote:

Yeah, good question - personally I would only want the string (which IIRC is what we got from Syck), but I can see the case for returning the node.

If there's already other public APIs in Psych that expect callers to work with Node objects then that seems reasonable and it perhaps more future-proof? Otherwise I'd keep it simple and use strings.


Reply to this email directly or view it on GitHub.

tenderlove avatar Nov 30 '13 04:11 tenderlove

That's interesting - are you thinking of an API that will hand over both the original representation (node) and the default parsed representation (potentially a hash)?

willbryant avatar Nov 30 '13 04:11 willbryant

No, just the node. It's pretty easy to convert the node to Ruby (off the top of my head you just call to_ruby on the node), but I can't represent say a has stricture as a string. I'd have to pass you the node.

I'll write an example, and maybe that will clear it up!

Aaron Patterson http://tenderlovemaking.com/ I'm on an iPhone so I apologize for top posting.

On Nov 29, 2013, at 8:58 PM, Will Bryant [email protected] wrote:

That's interesting - are you thinking of an API that will hand over both the original representation (node) and the default parsed representation (potentially a hash)?


Reply to this email directly or view it on GitHub.

tenderlove avatar Nov 30 '13 06:11 tenderlove

Here is an example:

require 'psych'

Psych.add_domain_type 'foo.bar,2002', 'foo' do |type, val|
  p [val, val.class]
end

Psych.load('--- !foo.bar,2002/foo hello')
Psych.load('--- !foo.bar,2002/foo { "foo":"hello" }')

The domain type can be used for anything (Scalars, Arrays, Hashes). It would make sense for val to be a consistent type (always a String object, etc). The only consistent type I could provide is an AST node object.

Can I ask what you're using this for? I'd like to improve this API, but I don't have any real-world examples.

tenderlove avatar Nov 30 '13 21:11 tenderlove

We have a BigDecimal serialisation, which we put in in 2007 before there was any standard format. At that time we used Syck:

  class BigDecimal
    def to_yaml(opts = {}) # if this mysterious stops working when you upgrade, check for new core extensions
      YAML::quick_emit(object_id, opts) do |out|
        out.scalar("tag:ruby.yaml.org,2002:decimal", to_s) # this is a standard Ruby type but not a standard Ruby YAML type, so this is slightly naughty.  if the community ever does set a standard, we can easily convert ours...
      end
    end
  end

  YAML.add_ruby_type("decimal") {|type, val| BigDecimal.new(val)}

But now with Psych, to avoid the values getting converted to Float values before we get them (and so losing precision), I found we had to extend the whole method (or monkeypatch to add to the case statement inside Psych::Visitors::ToRuby#deserialize):

  class BigDecimal
    def encode_with(coder)
      coder.scalar "!ruby/decimal", to_s
    end
  end

  class Psych::Visitors::ToRuby
    def deserialize_with_ruby_decimal(o)
      if o.tag == "!ruby/decimal"
        BigDecimal.new(o.value)
      else
        deserialize_without_ruby_decimal(o)
      end
    end
    alias_method_chain :deserialize, :ruby_decimal
  end

willbryant avatar Dec 01 '13 06:12 willbryant