Type ambguity between strings and numbers
What steps will reproduce the problem?
1. Parse this YAML string: data: "1"
2. Try to convert the data like so: node.as<int>(), succeeds (!)
3. Try to convert the data like so: node.as<string>(), succeeds
4. See node.Scalar() return the string: 1
What is the expected output? What do you see instead?
I expect this to work like YAML does in Python where the type of the value the
user placed in the file is preserved:
>>> import yaml
>>> yaml.load('data: 1')
{'data': 1}
>>> yaml.load('data: "1"')
{'data': '1'}
yaml-cpp totally throws away the true type of the value and it appears
impossible to figure out what it was in the first place.
What version of the product are you using? On what operating system?
I am using Version 0.5.1-1 on Ubuntu 12.04 32 bits
Please provide any additional information below.
For my use case I can't use YAML cpp because of the ambiguity.
Original issue reported on code.google.com by [email protected] on 23 Oct 2014 at 7:33
I just ran into this bug. As far as I can tell, there's no way to determine the original type intention (i.e. was the value quoted?) to prevent strings that contain boolean, integral, or floating point values from converting to primitive types. Our software parses YAML and expects values that were specified as strings to remain strings in our type system. Thus, parsing '5' creates an integer in our type system where a string was expected.
On renewing the spec again I see it says "In YAML, untagged nodes are given a type depending on the application." It seems that the only thing missing is knowing whether the scalar was an explicit string. Then the app has enough information to handle all cases.
If the parsed scalar is not a defined type via Tags, and is not an explicit string, then the application can choose to attempt parsing the scalar as possibly a non string type if appropriate. As far as I can tell this can't be done at the moment, so that's the core issue. Perhaps a simple method and member flag on Node like .isExplicitString() that returns a bool would do the trick? Or even simply .quoted().
A quoted member would be sufficient for my purposes, at least.
A quoted member would be helpful for me too; the same issue is when encoding by the way. Strings that could be considered booleans or numbers should also be explicitly quoted, otherwise a string like "false" would be converted to a boolean.
I am looking at the code right now; am I correct that Token::NON_PLAIN_SCALAR signifies an explicit string scalar (quoted or otherwise), while Token::PLAIN_SCALAR would be just any token?
In this case we would just have to pass through that information…
OK, at passing a quoted boolean to the Event parser seems to be fairly trivial: https://github.com/koraa/yaml-cpp/commit/6993d984df8d553b601276527c2977abb88230e8
Converting to the proper type with that information seems to work fine https://github.com/koraa/large-yaml2json-json2yaml/blob/master/yaml2json.cc#L72 .
I suspect I'd have to add support for this flag to the Nodes too. Any other API where that flag would have to be added?
Sorry I didn't see this discussion. The spec section 3.3.2 describes "non-specific tags" for nodes that aren't explicitly tagged; for non-plain (e.g. quoted) scalars, this is "!".
The spec also describes tag resolution for non-specific tags. In the quoted scalar case, it would resolve to "tag:yaml.org,2002:str".
yaml-cpp adds non-specific tags, but does not resolve them. This means that you can check the tag of a scalar node, and if it's "!", then it should be a string. If it's "?", then it was a plain scalar, and might be resolved in various ways depending on the application.
@jbeder Looks like using Tag will be a suitable work around for my issue. Thanks for the information!
I'm also looking for a way to ask yaml-cpp about the original type. In my code 42 and "42" are two completely different things. I also think that throwning exceptions is not the right thing. There's nothing exceptional in client code inquiring about a data type, so this sounds like abusing the exception mechanism to me. Instead why don't you just make Node::Scalar() return a variant? It seems to me you're struggling with the best way to handle something that std::variant or even eggs::variant already implemented. Imho this logic doesn't belong to yaml-cpp.
On the node's side I'd write something like this (assuming eggs::variant):
class Node {
typedef eggs::variant<std::string, int, float, bool> value_type;
public:
//just a convenience method for backwards compatibility, will throw if variant contains the wrong type
template <typename T> const T& as() { return eggs::get<T>(Scalar()); }
//this one tries to convert the value to T if variant doesn't hold a T already
template <typename T> T convert() const {
if (this->Scalar().which() == type_index_in_variant<value_type, T>) {
return eggs::get<T>(Scalar());
}
else {
std::stringstream ss;
eggs::apply([&ss](auto v) { ss << v; }, Scalar());
T retval;
ss >> retval;
return retval;
}
}
};
Where type_index_in_variant is a small helper metafunction that I implemented like this:
template <std::size_t I, typename Search, typename... Args>
struct TypeIndexInList;
template <std::size_t I, typename Search>
struct TypeIndexInList<I, Search> {
static const constexpr std::size_t value = I;
};
template <std::size_t I, typename Search, typename Arg, typename... Args>
struct TypeIndexInList<I, Search, Arg, Args...> {
static const constexpr std::size_t value = (std::is_same<Search, Arg>::value ?
I :
TypeIndexInList<I + 1, Search, Args...>::value
);
};
template <typename Search, typename... Args>
const constexpr std::size_t type_index_in_variant = 0;
template <typename Search, typename... Args>
const constexpr std::size_t type_index_in_variant<eggs::variant<Args...>, Search>
= TypeIndexInList<0, Search, Args...>::value;
I hope this code will help fixing this long outstanding issue. This is a blocker and I have to find a solution, thanks for looking into this asap.
Any comments or updates on this please? @jbeder?
+1
One approach I have found that worked is using YAML::convert<T>::decode, something like:
if (node.Tag() == "tag:yaml.org,2002:binary") {
// Process as binary value
}
if (bool bool_value; YAML::convert<bool>::decode(node, bool_value)) {
// Process as bool value
}
if (int64_t int_value; YAML::convert<int64_t>::decode(node, int_value)) {
// Process as int value
}
if (double double_value; YAML::convert<double>::decode(node, double_value)) {
// Process as double value
}
// Process as string value
To disambiguate between types of ints and floating point values, you can make use of tags in your YAML, such as !u for uint32_t, !ul for uint64_t, !f64 for double, etc. detect the tags using node.Tag(), and cast them to the corresponding types accordingly.
I posted this as an answer on this StackOverflow post, where someone was asking the same question about disambiguating among different Scalar types.