rapidyaml icon indicating copy to clipboard operation
rapidyaml copied to clipboard

Empty strings are parsed as null

Open Gei0r opened this issue 3 years ago • 6 comments

#include <ryml.hpp>
#include <iostream>

int main()
{
    ryml::Tree tr;
    tr.rootref() |= ryml::MAP;
    tr.rootref()["foo"] = "";
    std::cout << tr;
}

Expected output: foo: '', actual output: foo: (trailing space)

History:

  • Prior to 2707b25d7932bf70daf52edcbd8895ca45de85be: foo: '' (correct imo)
  • Starting with 2707b25d7932bf70daf52edcbd8895ca45de85be: foo: ~
  • Starting with cd901df9d046e1cd41142dff5c5434196a2bc7d3: foo: (trailing space)

Maybe key_is_null()/val_is_null() should be taken into account when emitting?

Gei0r avatar Jun 08 '22 09:06 Gei0r

Maybe key_is_null()/val_is_null() should be taken into account when emitting?

Won't help, because the string is already interpreted as null during parsing:

#include <iostream>
#include <ryml.hpp>

int main()
{
    ryml::Tree tr;
    ryml::NodeRef root = tr.rootref();
    root |= ryml::MAP;
    root["foo"] = "";

    std::cout << "Val null: " << root["foo"].val_is_null() << "\n";
    std::cout << tr;
}

Output:

Val null: 1
foo: 

Gei0r avatar Jun 09 '22 18:06 Gei0r

Yes, this one has been challenging me, thanks for reporting and for the bisection.

Off the top of my head, I think val_is_null() should be false here. This is not an empty string, but rather a zero-length string.

I would expect something like this:

#include <iostream>
#include <ryml.hpp>

int main()
{
    ryml::Tree tr;
    ryml::NodeRef root = tr.rootref();
    root |= ryml::MAP;
    root["foo"] = "";
    root["bar"] = ryml::csubstr();

    // points at an address, zero length - not null
    assert(root["foo"].val_is_null() == false);
    // points at no address, so is null
    assert(root["bar"].val_is_null() == true);
}

So that this would emit {foo: '', bar: }.

But there may be other subtle factors at play, and purely changing val_is_null() may have deeper undesired consequences, which may need to be tackled on their own. Or maybe this is from the programatic tree. What happens if we parse {foo: '', bar: }?

I need to investigate, but am really short of time ATM. Expect a delay before a solution exists. If you need a speedy solution, I'd be happy to accept a PR for this.

biojppm avatar Jun 10 '22 08:06 biojppm

I agree with your idea of how it should look like. I think I can take a look over the weekend, maybe just start with some test cases.

Gei0r avatar Jun 10 '22 15:06 Gei0r

My testing so far has revealed the following, which you maybe already knew:

  1. c4::basic_substring considers zero-length strings to be "equal" to nullptr (here)
  2. Therefore, when a yaml string is zero-length and not quoted, it is regarded as being null.
  3. When parsing a yaml document, that's never a problem. Because the only way to put zero-length strings in a yaml document is to quote them, and the quoting is taken into account by val_is_null() (and when emitting)

I see two ways to get around this issue:

  1. When setting a zero-length string to a yaml node programatically (not when parsing), always set the "quoted" flag (VALQUO/KEYQUO) OR

  2. Fundamentally rework what is considered a "null" string. For example, differentiate between:

    1. len == 0 && str == nullptr → This is a "null" string
    2. len == 0 && str != nullptr → This is a zero-length, but non-null string

    When parsing, "null" strings would have to be created for unquoted zero-length scalars.

Gei0r avatar Jun 14 '22 19:06 Gei0r

1 is bad because there are several ways that may be done, and it is easy to miss one or more of them; and most of all it is bad because as a user if I set a string, I don't expect VALQUO/KEYQUO to turn on.

OTOH, 2 is semantically sane. But the implications of fixing this may be deep enough to require looking at other parts which may break. Hopefully the high coverage will help here.

biojppm avatar Jun 16 '22 15:06 biojppm

Let's continue the discussion at the merge request

Gei0r avatar Jun 16 '22 20:06 Gei0r