[enhancement]: Support fastjsonschema as well as jsonschema for schema validator
Enhancement
In RHEL, we are looking to drop support for jsonschema. We would like to use fastjsonschema. The api doc is here: https://horejsek.github.io/python-fastjsonschema/
It would be nice if cloud-init could support both python libraries to check json schema.
Good suggestion @ani-sinha, I can see it packaged in alpine, rawhide, fedora, debian and ubuntu archives as well so making this switch should be desired for distributions which have this package available. This will be a fairly complex bit of work because cloudinit.config.schema has some tight coupling with the following for custom extensions:
- jsonscheme.ValidationError
- jsonschema.Draft4Validator, FormatChecker
- jsonschema.exceptions.best_match, SchemaError
- jsonschema.validators.create
But, this is definitely something worth exploring further to size this amount of effort.
Testing the impact to boot I put util.log_time around our current schema validation performed during early boot at it averages at 0.003 seconds. Given this is a fairly complex bit of work to adapt to use fastjsonschema for our custom annotations, errors and deprecation handling it's unlikely that upstream will prioritize this effort as python3-jsonschema seems to be functional and a minimal cost to boot times.
We would welcome patches for this work and help shepherd in those changes. If the community starts to approach this effort we would like to make sure that we can retain our schema error and deprecation annotation functionality:
cat > example.yaml <<EOF
#cloud-config
# Basic system setup
hostname: example-host
bogus: asdf
# Package management
apt_update: true
package_upgrade: true
packages:
- git
- nginx
- python3
EOF
lxc launch ubuntu-daily:oracular -c security.nesting=true -c cloud-init.user-data="$(cat example.yaml)" -c test-5764
lxc exec test-5764 -- cloud-init status --wait
lxc exec test-5764 bash
root@example-host:~# cloud-init schema --system --annotate
Found cloud-config data types: user-data, network-config
1. user-data at /var/lib/cloud/instances/e4605433-ac8c-4514-873a-6661cb8ae7ac/cloud-config.txt:
#cloud-config
# from 1 files
# part-001
---
apt_update: true # D1
bogus: asdf # E1
hostname: example-host
package_upgrade: true
packages:
- git
- nginx
- python3
...
# Errors: -------------
# E1: Additional properties are not allowed ('bogus' was unexpected)
# Deprecations: -------------
# D1: Deprecated in version 22.2. Use **package_update** instead.
2. network-config at /var/lib/cloud/instances/e4605433-ac8c-4514-873a-6661cb8ae7ac/network-config.json:
Valid schema network-config
Error: Invalid schema: user-data
I think the main motivation to consider switching from jsonschema to fastjsonschema it to reduce the install footprint and not to cut validation time. E.g. on my Fedora 40 I see:
$ rpm -q --requires python3-fastjsonschema | grep -v rpmlib
python(abi) = 3.12
$ rpm -q --requires python3-jsonschema | grep -v rpmlib
/usr/bin/python3
python(abi) = 3.12
python3.12dist(attrs) >= 22.2
python3.12dist(jsonschema-specifications) >= 2023.3.6
python3.12dist(referencing) >= 0.28.4
python3.12dist(rpds-py) >= 0.7.1
Schema validation is optional, this gives two options: install jsonschema and all its dependencies and get full validation or not install it and skip validation completely. Maybe there's room for a third option, e.g. install fastjsonschema and get some 'basic' validation (errors only)?