Draft: Speeding up pysigma by using a faster yaml parser
This is currently a draft. Would close #128
The goal of this PR to make it possible to use a yaml parser that has the most intensive parts written in rust. This can considerable increase the speed of a large amount of yaml documents.
To install the rust parts you would need to install the library in the following way pip install pysigma[rust]. Since there may be slight shift in behaviour this shouldn't be made the standard behaviour. My suggestion is that once major version is introduced we can shift this around so that ryaml becomes the default library used for parsing yaml.
ryaml which is the library written partially in rust does not have safe_load. But it is also not vulnerable to the same attacks that pyyaml was. To exemplify this I present the following poc taken from HackerOne.
Create both of these files in the same directory. You also need to run an older version of pyyaml; say 3.13. pip install ryaml.
config.yaml
!!python/object/apply:subprocess.Popen
- ls
poc.py - need to use an older version of pyyaml 3.13
import os
import yaml
import ryaml
def yaml_config():
with open('config.yaml') as config_file:
return yaml.load(config_file)
def ryaml_config():
with open('config.yaml') as config_file:
return ryaml.load(config_file)
yaml_config()
print("========== yaml =============")
ryaml_config()
print("========== ryaml =============")
It's not vulnerable to the same attack, so it should be considered safe.
Some parts of the rust library was not fully compliant with the yaml spec, which is why some tests are still failing. Currently I'm tracking those required changes here > https://github.com/ethanhs/ryaml/issues/3
Nice, thanks for the cool contribution and the good analysis of the potential vulnerability! I just have some small stuff I will comment here in the PR review, but from my perspective nothing speaks against a merge as this is optional and the bool problem is solved from my perspective. Tests using YAML 1.1 should be fixed to comply with version 1.2. Can you provide the fixed tests as you know which one fail with ryaml?
I noticed that there was one bit lacking, which is that duplicate keys shouldn't be accepted. Right now I'm waiting for a PR in ryaml to be accepted / discussed first, so once that is resolved I'll fix the other bits here.
Closing for now, we can reopen if continued.
@thomaspatzke Possible to reopen this? pyyaml is a turtle, especially when converting the whole of sigma main repo.