pySigma icon indicating copy to clipboard operation
pySigma copied to clipboard

Draft: Speeding up pysigma by using a faster yaml parser

Open jonathan-s opened this issue 2 years ago • 2 comments

This is currently a draft. Would close #128

The goal of this PR to make it possible to use a yaml parser that has the most intensive parts written in rust. This can considerable increase the speed of a large amount of yaml documents.

To install the rust parts you would need to install the library in the following way pip install pysigma[rust]. Since there may be slight shift in behaviour this shouldn't be made the standard behaviour. My suggestion is that once major version is introduced we can shift this around so that ryaml becomes the default library used for parsing yaml.

ryaml which is the library written partially in rust does not have safe_load. But it is also not vulnerable to the same attacks that pyyaml was. To exemplify this I present the following poc taken from HackerOne.

Create both of these files in the same directory. You also need to run an older version of pyyaml; say 3.13. pip install ryaml.

config.yaml

!!python/object/apply:subprocess.Popen
- ls

poc.py - need to use an older version of pyyaml 3.13

import os
import yaml
import ryaml


def yaml_config():
    with open('config.yaml') as config_file:
        return yaml.load(config_file)


def ryaml_config():
    with open('config.yaml') as config_file:
        return ryaml.load(config_file)


yaml_config()
print("========== yaml =============")

ryaml_config()
print("========== ryaml =============")

It's not vulnerable to the same attack, so it should be considered safe.

Some parts of the rust library was not fully compliant with the yaml spec, which is why some tests are still failing. Currently I'm tracking those required changes here > https://github.com/ethanhs/ryaml/issues/3

jonathan-s avatar Jul 05 '23 10:07 jonathan-s

Nice, thanks for the cool contribution and the good analysis of the potential vulnerability! I just have some small stuff I will comment here in the PR review, but from my perspective nothing speaks against a merge as this is optional and the bool problem is solved from my perspective. Tests using YAML 1.1 should be fixed to comply with version 1.2. Can you provide the fixed tests as you know which one fail with ryaml?

thomaspatzke avatar Jul 09 '23 22:07 thomaspatzke

I noticed that there was one bit lacking, which is that duplicate keys shouldn't be accepted. Right now I'm waiting for a PR in ryaml to be accepted / discussed first, so once that is resolved I'll fix the other bits here.

jonathan-s avatar Jul 15 '23 18:07 jonathan-s

Closing for now, we can reopen if continued.

thomaspatzke avatar Sep 13 '24 22:09 thomaspatzke

@thomaspatzke Possible to reopen this? pyyaml is a turtle, especially when converting the whole of sigma main repo.

ForkingFrenzy avatar Jul 14 '25 04:07 ForkingFrenzy