SOPS displacing comments and causing YAML-breaking invalid syntax error and data loss
Hello everyone; I found a bug in the SOPS encryption, which you can replicate yourself.
Unencrypted file test.yaml:
this:
is:
an: example
# comment
I use age encryption, so I set up the SOPS_AGE_RECIPIENTS and SOPS_AGE_KEY_FILE variables, and then run sops --encrypt --encrypted-regex an test.yaml, obtaining this:
this:
is:
an: ENC[AES256_GCM,data:7ew99FdoYQ==,iv:62k8jSLjg2l7YyorqOf9bobpgT7m+1qeEK66JpyObTE=,tag:caRiamKAO9TqZHj2qfKAgA==,type:str]
sops:
# comment
kms: []
gcp_kms: []
azure_kv: []
hc_vault: []
age:
- recipient: age107p580kfl3mlr2x7jqfhattzv0a3kd88ugjq2p5tsjz59jzdmdfsj2qwm2
enc: |
-----BEGIN AGE ENCRYPTED FILE-----
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBZa3VuTFBucnV4QldrZW0z
am9jTkw1ejZIUjBUVHpRTjRONUNTYnRXWFVRCncwdk5adnpDZktnK3JwVWU0R1o4
UGlIOHpSdUpTQjBhWnMwcXAyZlhmTFEKLS0tIFk2bitXOHN2T2VIbHZzVnVlTVpn
VGNUSkdFWENLSEFQdGh4OW1TMk1HZHMKvMCQo5SXkMk9HQ4Bv7n1UFj+IZDi0z8K
1LJKCovXp7d+zceCwbVM9zfbleF7AvFuyWSbw7o11WpE4vtaD8fDWg==
-----END AGE ENCRYPTED FILE-----
lastmodified: "2022-05-31T14:38:08Z"
mac: ENC[AES256_GCM,data:BNPYx43KNPdggGC/+qgHQfoMj+ctnrwjOpt+pOzoWcQCcS3IP1h0FtD/UcSc8mLTxS2+6gBBiFU8JIdnr8XwueIeGbcx0zEk8LZdspv0DuVQoC5AbiU4qnvCFRY7MIdmSKacG7zXVN4nQ2kW/Ip7jqhBAqZJl5cbUBcaQOgfGCo=,iv:aLQ+lhmgxzyrQphjknd1O1KHC6BVG1E0hM9tCfFs+J4=,tag:sHhaEqLU+/VRDRjTajAnwQ==,type:str]
pgp: []
encrypted_regex: an
version: 3.7.3
As you can see, the # comment line is moved inside the sops metadata; whenever that file is decrypted, the comment will be gone (hence the data loss I mention in the title).
The reason why this is important is because my team is using some comment "tags" to mark code blocks, and the last tag disappears after encrypting-and-decrypting a file.
I did some extra testing to try to understand when this was happening, and I gathered some conclusions: it only happens if the indentation is at least the one in my example file (so at least 3) and the if the comment is "less indented" than the last line.
Sorry if this is a known issue and I missed it! :)
Adding to this issue, it can break a YAML file, by producing an invalid syntax.
To reproduce, use this starting file test.yaml:
this:
is:
an: example
# comment
someArray: []
Again, run sops --encrypt --encrypted-regex an test.yaml. This will produce:
this:
is:
an: ENC[AES256_GCM,data:BVubel9Etw==,iv:eSU4vD0Gl+ArqJYXz601lTYCcpuXGfdMj5ViU2jGOaE=,tag:Du0JCAnjc+pGYBRBqpIg+w==,type:str]
someArray:
# comment
[]
sops:
kms: []
gcp_kms: []
azure_kv: []
hc_vault: []
age:
- recipient: age1atpv5v2yrfuxxpe2gpddfq2n5g7vck89eg8fjqytrjrxy7plqfcs7l7788
enc: |
-----BEGIN AGE ENCRYPTED FILE-----
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBrUDRtTjRCaktvbnhFQVpq
ZGRyVm9rL2YrOUNYdnExOXVlQWhiczhZOVNvClFBVTZyWjh4ck53aE55cC9pZThv
NExTS3BpZFdpWG9tMFZOeUZ3dFNtYncKLS0tIFRBTmYwNzd3dXZGN3pOYlBraS9y
aklTbVpZNm54MjNVaXQ5bTRCQURkWU0KkoNrtSRxjGBjCGjAR8HRZqRH8tAZ432B
HP7spoDOqtsRUuZPhBqQpvV+mP50z9ztNqrw5f8MDIirXsiw1qvYyg==
-----END AGE ENCRYPTED FILE-----
lastmodified: "2022-05-31T16:21:18Z"
mac: ENC[AES256_GCM,data:BBYePXfdlh6EwLdvft7OncjUWpE2WrLdhtaKNONPZymdv7q23QEGJJxG3ktE02PcdS2aDsflwM3wSkPq5PPQSQXVZbtIMutKCdrKZGerEPUSF6KXy00CqiLn12kbO31JIisw5K+JvmhQ0vxopPAn0o4w0hABvy9Pt/ZKFt8pjkU=,iv:n+hzjCnrxQL986BmrTbToS3dWiWKPK8uRxdN+Dqc8xA=,tag:CGKCtco5LiZyhAbInkBaAQ==,type:str]
pgp: []
encrypted_regex: an
version: 3.7.3
As you can see, # comment got in the middle of the empty array definition. This breaks the YAML syntax; the file can't even be decrypted anymore!
Thanks for this report! I can confirm that this happens with the latest version.
Interestingly I cannot reproduce this when removing the empty line after the comment... Then in both cases it works fine. But if there is an empty line after the comment, the error happens.
#1069 should fix this issue. There is the other problem that if there is no newline, that the comment will be parsed as a foot comment for the innermost map entry (an: example) and thus it will be indented on another level after re-serialization, but that seems to be something caused by yaml.v3.
I also noticed it can displace multiple comments into the sops metadata, as long as they are in consecutive lines after the # comment in my example and they are "indented enough"
But anyway, thanks for addressing it quickly! Hope it gets reviewed and merged soon :)
Just wanted to add to this issue, as i think it might be the same problem. There is a helm chart values.yaml file for the kube-prometheus-stack located at https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/values.yaml that also causes invalid syntax when you try to encrypt and decrypt the file.
# sops --version
sops 3.7.3 (latest)
It also seems like the output of the encryption removes all blank lines that might have been present in the original file.
In addition, some of the indention is also getting changed. For example the first entry, "defaultRules:"
@BlackBsd indeed, there are more such bugs. A minimal reproducer for the issue you reported is
a:
b:
c:
d: ""
# e
# f
g: {}
It also seems like the output of the encryption removes all blank lines that might have been present in the original file.
In addition, some of the indention is also getting changed. For example the first entry, "defaultRules:"
These are both normal and to be expected. Sops should produce semantically equivalent YAML files (including comments), not the original YAML file back.
Meh. Now I tried to spend quite some time to figure out why this is still not fixed, only to realize that the fix merged (#1069) works fine, but hasn't been released yet.
So well, this is already fixed on the develop branch since beginning of June...
That is awesome, thank you. When do you foresee this making it into a new release?
I wish I would know... @ajvb do you have any indication when a new (bugfix) release will happen?
any update on this?
There has not been a new release, so: no. Unfortunately not. :(
This will be included in the upcoming release.
In version 3.8.1 It is still displacing comments in YAML