clevis report shows error but exits with 0
I might have found a bug or edge case. An error is shown that the json of an advertisement is invalid, but the exit code is not 1. This happened with a clevis config that uses "sss" with two tang servers. It only happened occasionally in my test setup and I suspect it might be because the clevis endpoint returned an empty response or some other malformed response due to a network error (maybe). It's hard for me to reproduce this, unfortunately.
root@host:~# clevis luks report -d /dev/sda2 -s1
Invalid json!
Usage: jose fmt [OPTIONS]
Converts JSON between serialization formats
-X --not Invert the following assertion
-O --object Assert TOP to be an object
-A --array Assert TOP to be an array
-S --string Assert TOP to be a string
-I --integer Assert TOP to be an integer
-R --real Assert TOP to be a real
-N --number Assert TOP to be a number
-T --true Assert TOP to be true
-F --false Assert TOP to be false
-B --boolean Assert TOP to be a boolean
-0 --null Assert TOP to be null
-E --equal Assert TOP to be equal to PREV
-Q --query Query the stack by deep copying and pushing onto TOP
-M # --move=# Move TOP back # places on the stack
-U --unwind Discard TOP from the stack
-j JSON --json=JSON Parse JSON constant, push onto TOP
-j FILE --json=FILE Read from FILE, push onto TOP
-j - --json=- Read from STDIN, push onto TOP
-c --copy Deep copy TOP, push onto TOP
-q STR --quote=STR Convert STR to a string, push onto TOP
-o FILE --output=FILE Write TOP to FILE
-o - --output=- Write TOP to STDOUT
-f FILE --foreach=FILE Write TOP (obj./arr.) to FILE, one line/item
-f - --foreach=- Write TOP (obj./arr.) to STDOUT, one line/item
-u FILE --unquote=FILE Write TOP (str.) to FILE without quotes
-u - --unquote=- Write TOP (str.) to STDOUT without quotes
-t # --truncate=# Shrink TOP (arr.) to length #
-t -# --truncate=-# Discard last # items from TOP (arr.)
-i # --insert=# Insert TOP into PREV (arr.) at #
-a --append Append TOP to the end of PREV (arr.)
-a --append Set missing values from TOP (obj.) into PREV (obj.)
-x --extend Append items from TOP to the end of PREV (arr.)
-x --extend Set all values from TOP (obj.) into PREV (obj.)
-d NAME --delete=NAME Delete NAME from TOP (obj.)
-d # --delete=# Delete # from TOP (arr.)
-d -# --delete=-# Delete # from the end of TOP (arr.)
-l --length Push length of TOP (arr./str./obj.) to TOP
-e --empty Erase all items from TOP (arr./obj.)
-g NAME --get=NAME Get item with NAME from TOP (obj.), push to TOP
-g # --get=# Get # item from TOP (arr.), push to TOP
-g -# --get=-# Get # item from the end of TOP (arr.), push to TOP
-s NAME --set=NAME Sets TOP into PREV (obj.) with NAME
-s # --set=# Sets TOP into PREV (obj.) at #
-s -# --set=-# Sets TOP into PREV (obj.) at # from the end
-y --b64load URL-safe Base64 decode TOP (str.), push onto TOP
-Y --b64dump URL-safe Base64 encode TOP, push onto TOP
Advertisement is malformed
root@host:~# echo $?
0
Hello. Thanks for reporting this issue. It is difficult for me to reproduce it, but it seems some error expansion is missed in the code. I created a PR for this. In case you can test it, it is welcome.
@sarroutbi Thanks for addressing this issue. I currently can't reproduce the same error again, perhaps your PR fixes it. I also noticed in the report_sss function does not seem to propagate an error if the "$content" variable contains invalid json, maybe it could be fixed with something like this, I think:
local jwe
local jwes
if ! jwes="$(jose fmt --json="${content}" --get jwe --foreach=-);"; then
return 1
fi
for jwe in $jwes; do
jwe="$(printf '%s' "${jwe}" | sed -e 's/"//g')"
report_decode "${jwe}" || return 1
done
But this would likely not fix the same error I experienced above, because there it logged "Advertisement is malformed", but everytime I try to put my own malformed advertisement in there, it does exit correctly with 1... If I find out what happened that it returned 0, I'll post it here.
Hello. Thanks. I will include it in the PR also
@sarroutbi After some debugging, I am a bit confused by the behaviour of the script. I'll post some of my findings and thoughts here in case they are useful in some way. Here I am using a node with a "sss" pin and three tang servers. I experimented with turning some servers off to see what happens.
- Here one of the tang server has a new key that the node wants to add, but tang-2 is not reachable. Somehow a binding is generated and the command exits with 0 even though errors are printed.
root@host:~# clevis luks report -r -d /dev/vda2 -s1
Unable to fetch advertisement (http://tang-server-2/adv)
The following keys are not in the current advertisement and were probably rotated:
spL123tXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Regenerating binding (device /dev/vda2, slot 1):
Pin: sss, Config: '{"t":1,"pins":{"tang":[{"url":"http://tang-server-1"},{"url":"http://tang-server-2"},{"url":"http://tang-server-3"}]}}'
Error communicating with the server!
Warning: Value 512 is outside of the allowed entropy range, adjusting it.
Binding regenerated successfully
root@host:~# echo $?
0
I cannot reproduce this anymore, maybe it was some weird condition on my side...
- It seems like if not all servers are down and no keys were rotated on the server(s) that are reachable, the script exits with 0 even though it has not checked all of them.
root@host:~# clevis luks report -r -d /dev/vda2 -s1
Unable to fetch advertisement (http://tang-server-1/adv)
Unable to fetch advertisement (http://tang-server-2/adv)
root@host:~# echo $?
0
This is reproducable.
- It seems in the end I maybe corrupted something in the slot as it asks for the LUKS password even though one tang server should still be able to be used for recovery and only two of them rotated keys that must be renewed. I assume it at some time created a binding with only a subset of the three tang server when I ran the command while one or more were offline :thinking: I am noticing that even though I am passing the
-qoption, it asks for a password. It probably should fail instead, since in cronjobs maybe this does not lead to a non-zero exit code.
root@host:~# clevis luks report -r -q -d /dev/vda2 -s1
The following keys are not in the current advertisement and were probably rotated:
w-0XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
uNXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Regenerating binding (device /dev/vda2, slot 1):
Pin: sss, Config: '{"t":1,"pins":{"tang":[{"url":"http://tang-server-1"},{"url":"http://tang-server-2"},{"url":"http://tang-server-3"}]}}'
No key available with this passphrase.
Enter existing LUKS password: