msgspec icon indicating copy to clipboard operation
msgspec copied to clipboard

Replace `PyYAML`

Open chirizxc opened this issue 2 months ago • 12 comments

Description

PyYAML parses yaml so that msgspec can validate the data, but it doesn't fully pass yaml-test-suite tests and fails even on small examples like this:

from msgspec.yaml import decode

print(
    decode(
        "! 15",
        type=str,
        strict=True,
    ),  # msgspec.ValidationError: Expected `str`, got `int`
)

From YAML v1.2 spec:

Example 2.27 Invoice:

from msgspec.yaml import decode

yaml = """\
--- !<tag:clarkevans.com,2002:invoice>
invoice: 34843
date   : 2001-01-23
bill-to: &id001
  given  : Chris
  family : Dumars
  address:
    lines: |
      458 Walkman Dr.
      Suite #292
    city    : Royal Oak
    state   : MI
    postal  : 48046
ship-to: *id001
product:
- sku         : BL394D
  quantity    : 4
  description : Basketball
  price       : 450.00
- sku         : BL4438H
  quantity    : 1
  description : Super Hoop
  price       : 2392.00
tax  : 251.42
total: 4443.52
comments:
  Late afternoon is best.
  Backup contact is Nancy
  Billsmer @ 338-4338."""

print(decode(yaml))
# msgspec.DecodeError: could not determine a constructor for the tag 'tag:clarkevans.com,2002:invoice'
#  in "<unicode string>", line 1, column 5

Also:

from msgspec.yaml import decode

# Example 2.23 Various Explicit Tags
yaml = """\
---
not-date: !!str 2002-04-28

picture: !!binary |
 R0lGODlhDAAMAIQAAP//9/X
 17unp5WZmZgAAAOfn515eXv
 Pz7Y6OjuDg4J+fn5OTk6enp
 56enmleECcgggoBADs=

application specific tag: !something |
 The semantics of the tag
 above may be different for
 different documents."""

print(decode(yaml))
# msgspec.DecodeError: could not determine a constructor for the tag '!something'
#   in "<unicode string>", line 10, column 27

chirizxc avatar Nov 24 '25 20:11 chirizxc

As a replacement, I would like to suggest my lib: yaml-rs (Well, it seems that the signature will change for dumps(), so this will require breaking changes. I think it will be possible to replace loads() for now)

It passes all yaml-test-suite tests and is also faster than PyYAML (written in Rust and based on saphyr parser).

chirizxc avatar Nov 24 '25 21:11 chirizxc

PyYAML is a YAML 1.1 parser, switching it out for yaml-rs which is a YAML 1.2 parser will introduce behavior changes.

Don't get me wrong, I'm all for dropping PyYAML, since it hasn't made any progress toward 1.2 support for the past couple of years, but the original post seems to gloss over the behavior changes this would cause for some.

Related issue: #867

floxay avatar Nov 25 '25 08:11 floxay

PyYAML is a YAML 1.1 parser, switching it out for yaml-rs which is a YAML 1.2 parser will introduce behavior changes.

Don't get me wrong, I'm all for dropping PyYAML, since it hasn't made any progress toward 1.2 support for the past couple of years, but the original post seems to gloss over the behavior changes this would cause for some.

Related issue: #867

It seems so, we can make two optional dependencies for different versions of yaml? Something like:

[project.optional-dependencies]
yaml_v_1_1 = [
  "pyyaml",
]

yaml_v_1_2 = [
  "yaml-rs",
]

chirizxc avatar Nov 25 '25 08:11 chirizxc

It's up to the mantainers to decide but I have a question as I'm unfamiliar with the YAML format: is 1.2 not backwards compatible with 1.1? If yes, msgspec is still in "beta" given that version 0.Y.Z means that there might be backwards incompatible changes (which begs the question when will there be a 1.0.0 release and what are the expected features), so it would be ok to introduce another library for it; if not, I wouldn't go for it since it would mean breaking more than just the API.

jacopoabramo avatar Nov 25 '25 09:11 jacopoabramo

@chirizxc is there a strict requirement as to why dumps() has a different signature? Could this be circumvented somehow?

jacopoabramo avatar Nov 25 '25 09:11 jacopoabramo

It's up to the mantainers to decide but I have a question as I'm unfamiliar with the YAML format: is 1.2 not backwards compatible with 1.1? If yes, msgspec is still in "beta" given that version 0.Y.Z means that there might be backwards incompatible changes (which begs the question when will there be a 1.0.0 release and what are the expected features), so it would be ok to introduce another library for it; if not, I wouldn't go for it since it would mean breaking more than just the API.

There is no backward compatibility, for example: https://yaml.org/spec/1.2.2/ext/changes/

Image

chirizxc avatar Nov 25 '25 09:11 chirizxc

@chirizxc is there a strict requirement as to why dumps() has a different signature? Could this be circumvented somehow?

https://github.com/jcrist/msgspec/blob/a46a2c6f8b5991ebbe90fb1d2a9cb96628fd2311/src/msgspec/yaml.py#L86-L99

and

https://github.com/lava-sh/yaml-rs/blob/main/python/yaml_rs/init.py#L63-L64

chirizxc avatar Nov 25 '25 09:11 chirizxc

~~Hmm, generally speaking, there shouldn't be any differences, but I'm not sure about allow_unicode=True.~~

import yaml
import yaml_rs

data = {
    "русский": "текст на русском",
    "emoji": "😀🎉",
    "chinese": "中文",
}
print(yaml_rs.dumps(data))
print("\n\n")
print(yaml.dump(data, allow_unicode=True, sort_keys=False))
# ---
# русский: текст на русском
# emoji: 😀🎉
# chinese: 中文
#
#
# русский: текст на русском
# emoji: 😀🎉
# chinese: 中文

I would also like to note that some features are not fully implemented for dumps() in saphyr: Literal style (|), Folded style (>)

chirizxc avatar Nov 25 '25 09:11 chirizxc

It's up to the mantainers to decide but I have a question as I'm unfamiliar with the YAML format: is 1.2 not backwards compatible with 1.1? If yes, msgspec is still in "beta" given that version 0.Y.Z means that there might be backwards incompatible changes (which begs the question when will there be a 1.0.0 release and what are the expected features), so it would be ok to introduce another library for it; if not, I wouldn't go for it since it would mean breaking more than just the API.

In general, it would be reasonable to create separate optional-dependencies for different versions of yaml.🤔

chirizxc avatar Nov 25 '25 10:11 chirizxc

Yes but the burden on mantaining those dependencies is on the mantainers.

And besides there has to be justification on supporting YAML 1.2 over 1.1, and with pyyaml being the the most used library for it (and still not supporting 1.2), this will not make msgspec compatible with other parsing libraries.

jacopoabramo avatar Nov 25 '25 10:11 jacopoabramo

In my opinion, considering how all that msgspec does for TOML and YAML support is just providing a few wrapper functions, introducing additional optional dependecies to support a "newer" version of YAML seems like the introduction of a wrapper hell.

https://github.com/jcrist/msgspec/blob/0.20.0/src/msgspec/yaml.py is ~190 lines, about 90% of those are just comments, imports or type hints/overloads. ~20 lines of conversion logic where the general structure is like this;

def encode(obj, ...):
  data = msgspec.to_builtins(obj, ...)
  return external_lib.dump(data)

def decode(buf, ...):
  obj = external_lib.load(buf)
  return msgspec.convert(obj, ...)

Nothing crazy is going on here, it's all just convenience, but YAML is anything but convenient to work with. Writing your own 2 helper functions to wrap yaml-rs (or insert your favorite library here :D ) when you know you are dealing with YAML 1.2 probably makes more sense, otherwise this repo will just end up littered with issues like #867.

floxay avatar Nov 25 '25 10:11 floxay

YAML is anything but convenient to work with.

Highly related.

jacopoabramo avatar Nov 25 '25 10:11 jacopoabramo