JSON output support with --json/-J option
This PR revamps #141 and fixes all the complaints. It adds support for outputting pretty-printed JSON with variable indentation (--json=n), outputting compact non-pretty-printed JSON (--json=0) and outputting the --vcount and --outage options' results in JSON format.
~~Unfortunately choosing a custom indentation or disabling pretty-printing with -J 4 or -J 0 does not work, because the parser can't tell the difference between an option's optarg and a non-option argument (e.g. an IP address).~~
Seems like the parser accepts -J4 or -J0.
Coverage increased (+0.6%) to 80.131% when pulling 3923fa51853dc23faf75b5658b30d45cc0ae36e9 on Reperator:json-output into 12961d59eca1c1647676c634a41af9eb7f23e499 on schweikert:develop.
Thanks for the patch! I will need some time to review it / integrate it, but I just wanted to already say that it's much appreciated :-)
Considering that the JSON output is going to be an interface and we won‘t be able to change it later, we should make sure that the schema is how we want to keep it long term. Seeing the current output:
fping --json -n -A -c3 -p100 dns.google.com
{
"hosts": {
"dns.google.com (2001:4860:4860::8844)": {
"xmt": 3,
"rcv": 3,
"loss_percentage": 0,
"min": 3.32,
"avg": 3.32,
"max": 3.33
}
}
}
I see a couple of issues:
- The schema changes depending on the options that you use. If for example -C3 is used instead of -c3, then the values are the rtt measurements as array instead of the keys „xmt“, etc. We probably should use a consistent schema independently of what options are used, and just fill what is relevant.
- Using -n and -A combined forces the user to parse the output still (to get the name and address). They probably should be keys instead.
- „loss_percentage“ is quite long compared to the other keys. I also wonder whether this is needed at all (it is reduntant)
- Maybe the key "hosts" could be renamed "summary" so that in the future also other output could be intergrated (the output that comes without -q, which --json currently implies)
How the output could look like:
{
"summary": {
"dns.google.com“: {
"addr": "2001:4860:4860::8844",
"xmt": 3,
"rcv": 3,
"min": 3.32,
"avg": 3.32,
"max": 3.33,
"rtt": [ 3.32, 3.32, 3.33 ]
}
}
}
Let me know what you think. I can also work on making these changes.
- The schema changes depending on the options that you use. If for example -C3 is used instead of -c3, then the values are the rtt measurements as array instead of the keys „xmt“, etc. We probably should use a consistent schema independently of what options are used, and just fill what is relevant.
Good point. I think it best if we were to always include every possible field and null out everything that wouldn't be included in non-JSON output, e.g. something like
{
"summary": {
"dns.google.com": {
"addr": "2001:4860:4860::8844",
"host": "dns.google.com",
"xmt": 3,
"rcv": 3,
"outage": null,
"min": 3.32,
"avg": 3.32,
"max": 3.33,
"rtt": null
}
}
}
or
{
"summary": {
"2001:4860:4860::8844": {
"addr": "2001:4860:4860::8844",
"host": null,
"xmt": null,
"rcv": null,
"outage": null,
"min": null,
"avg": null,
"max": null,
"rtt": [
3.32,
3.32,
3.33
]
}
}
}
OTOH this is kind of wasteful since we can easily do the summary calculations even if we print all rtts anyway. Opinions?
- Using -n and -A combined forces the user to parse the output still (to get the name and address). They probably should be keys instead.
I'll add an addr key and a host key that include address and hostname respectively. I'll only include the hostname if the user provided it or if we've done rDNS resolution due to -n.
- „loss_percentage“ is quite long compared to the other keys. I also wonder whether this is needed at all (it is reduntant)
Technically most keys are redundant as long as we include the rtts. I'll throw out return_percentage and loss_percentage and keep the others.
- Maybe the key "hosts" could be renamed "summary" so that in the future also other output could be intergrated (the output that comes without -q, which --json currently implies)
Will do.
Let me know what you think. I can also work on making these changes.
No worries, I'll take care of it.
Edit: Do you want me to squash my commits?
Edit 2: Another thing I noticed is the issue of duplicate keys. Suppose a user runs fping -A -n -J 127.0.0.1 localhost or even simpler fping -A -n -J localhost localhost. Both summary entries would have the key localhost. JSON allows this, but I can imagine many parsers failing on this, especially if the JSON is supposed to be parsed as a programming language structure such as a hash table or an object. The problem is that identical keys may only become apparent after we have already done rDNS resolution. For now I'll just output duplicate keys; if this becomes a problem we may have to fail if we were about to output duplicate keys, or assign duplicate keys identifiers such as localhost (1), or instead of summary being an object we could make it an array. This would alleviate these problems but necessitates iterating over the array to find a specific entry (and we would still have entries that have identical addresses and hostnames).
Reperator, thanks for picking this back up! I'm encouraged to see others found value in the original suggestion. I've been using fping 4.2 with my original patch suggestions, but I look forward to your fix-ups and enhancements!
Any news about this? It will be really awesome to have json output!
My two cents is that the schema needs work. The point about the schema changing is good, but in my opinion the "null" suggestion is not quite right. Json consuming code that has to distinguish between a field being null and a field not being present is almost always going to have bugs.
" We probably should use a consistent schema independently of what options are used, and just fill what is relevant." is a good approach, and allows for additions to the schema in the future while preserving backwards compatibility with any JSON consumer that is not going to be confused by new keys (Another good practice IMHO)
I'm sorry, I completely forgot about this. I'll start working on it today; I hope I can get it finished soon-ish.
Json consuming code that has to distinguish between a field being null and a field not being present is almost always going to have bugs.
Agreed, I'll keep the schema consistent across options.
What about using numeric keys instead of IP address as a key? This should eliminate the issue with repeated IPs?
In that case we can just use JSON arrays. My main gripe with that is that the API for JSON objects is much nicer than the API for JSON arrays in many languages (hashtables vs. lists).
In that case we can just use JSON arrays. My main gripe with that is that the API for JSON objects is much nicer than the API for JSON arrays in many languages (hashtables vs. lists).
I think this should not be an issue. Every JSON consumer can work with arrays with ease. And all the API's that I know about are using json arrays in their json responses. P.S.: Maybe RTT's responses should also be array? If number of pings (-c) is greater than 1 we can include all the RTTs inside an array?
Not a big deal, I am already running fping and parsing the string results into json and passing them further along to other software... This would apply to yaml or xml too, I do the conversion on more of a line by line basis and emit a stream of json documents. In my case those documents have three different schemas, "progress" generated once per ping per IP, "result" generate once per IP, and "verbose result" which extends the result schema and has the verbose results, big surprise.
If this were of interest, https://en.wikipedia.org/wiki/JSON_streaming has some suggestions on outputting multiple json objects over a stream.
I can share an example as a different straw man to beat on.
//Ignore insonsistent numbers, I edited by hand for readability.
//Progress Results
{"ip": "8.8.8.8", "seq": 0, "size": 96, "rtt": 20.47, "rttAvg": 20.47, "loss": 0}
{"ip": "8.8.4.4", "seq": 0, "size": 96, "rtt": 23.73, "rttAvg": 23.73, "loss": 0}
{"ip": "8.8.8.8", "seq": 1, "size": 96, "rtt": 21.27, "rttAvg": 21.00, "loss": 0}
{"ip": "8.8.4.4", "seq": 1, "size": 96, "rtt": 24,00, "rttAvg": 23.53, "loss": 0}
{"ip": "8.8.4.4", "seq": 2, "size": 96, "rtt": 21.27, "rttAvg": 22.47, "loss": 0}
{"ip": "8.8.8.8", "seq": 2, "size": 96, "rtt": 22.73, "rttAvg": 21.27, "loss": 0}
{"ip": "8.8.8.8", "seq": 3, "size": 96, "rtt": 22.50, "rttAvg": 21.47, "loss": 0}
//lost #3 for 8.8.4.4
{"ip": "8.8.8.8", "seq": 4, "size": 96, "rtt": 22.73, "rttAvg": 22.00, "loss": 0}
{"ip": "8.8.4.4", "seq": 4, "size": 96, "rtt": 25.53, "rttAvg": 24.27, "loss": 0.2}
//Result (or verbose result. not both)
{"ip": "8.8.8.8", "sent": 5, "received": 5, "loss": 0, "rtt": {"min": 21.29, "avg": 22.29, "max": 23.89}}
//VerboseResult (note null in rtts array for lost packet)
{"ip": "8.8.4.4", "sent": 5, "received": 4, "loss": 0.20, "rtt": {"min": 21.45, "avg": 24.34, "max": 26.36}, "jitter": {"min": 0.266, "avg": 2.95, "max": 5.91}, "rtts": [23.43,null,21.17,26.76,25.26]}
{"ip": "8.8.8.8", "sent": 5, "received": 5, "loss": 0, "rtt": {"min": 20.62, "avg": 22.16, "max": 22.53}, "jitter": {"min": 0.1875, "avg": 0.949, "max": 1.719}, "rtts": [20.86,21.18,22.95,22.56,22.49]}
Not a big deal, I am already running fping and parsing the string results into json and passing them further along to other software... This would apply to yaml or xml too, I do the conversion on more of a line by line basis and emit a stream of json documents. In my case those documents have three different schemas, "progress" generated once per ping per IP, "result" generate once per IP, and "verbose result" which extends the result schema and has the verbose results, big surprise.
If this were of interest, https://en.wikipedia.org/wiki/JSON_streaming has some suggestions on outputting multiple json objects over a stream.
I can share an example as a different straw man to beat on.
//Ignore insonsistent numbers, I edited by hand for readability. //Progress Results {"ip": "8.8.8.8", "seq": 0, "size": 96, "rtt": 20.47, "rttAvg": 20.47, "loss": 0} {"ip": "8.8.4.4", "seq": 0, "size": 96, "rtt": 23.73, "rttAvg": 23.73, "loss": 0} {"ip": "8.8.8.8", "seq": 1, "size": 96, "rtt": 21.27, "rttAvg": 21.00, "loss": 0} {"ip": "8.8.4.4", "seq": 1, "size": 96, "rtt": 24,00, "rttAvg": 23.53, "loss": 0} {"ip": "8.8.4.4", "seq": 2, "size": 96, "rtt": 21.27, "rttAvg": 22.47, "loss": 0} {"ip": "8.8.8.8", "seq": 2, "size": 96, "rtt": 22.73, "rttAvg": 21.27, "loss": 0} {"ip": "8.8.8.8", "seq": 3, "size": 96, "rtt": 22.50, "rttAvg": 21.47, "loss": 0} //lost #3 for 8.8.4.4 {"ip": "8.8.8.8", "seq": 4, "size": 96, "rtt": 22.73, "rttAvg": 22.00, "loss": 0} {"ip": "8.8.4.4", "seq": 4, "size": 96, "rtt": 25.53, "rttAvg": 24.27, "loss": 0.2} //Result (or verbose result. not both) {"ip": "8.8.8.8", "sent": 5, "received": 5, "loss": 0, "rtt": {"min": 21.29, "avg": 22.29, "max": 23.89}} //VerboseResult (note null in rtts array for lost packet) {"ip": "8.8.4.4", "sent": 5, "received": 4, "loss": 0.20, "rtt": {"min": 21.45, "avg": 24.34, "max": 26.36}, "jitter": {"min": 0.266, "avg": 2.95, "max": 5.91}, "rtts": [23.43,null,21.17,26.76,25.26]} {"ip": "8.8.8.8", "sent": 5, "received": 5, "loss": 0, "rtt": {"min": 20.62, "avg": 22.16, "max": 22.53}, "jitter": {"min": 0.1875, "avg": 0.949, "max": 1.719}, "rtts": [20.86,21.18,22.95,22.56,22.49]}
I think showing the "progress" per IP per ping is a must! Either by adding this into the final response or outputting it while running. Right now the proposed implementation will imply the -q flag which is going to hide per-target/per-ping results
IMHO though I could be over-complicating, flags would allow you to choose between multiple json document output like I suggested, or a single json output with all the results, possibly including progress.
This could be as simple as either one json object per line, or starting with a [ adding a , after each json object per line, and ending with ]:
[
{"ip": "8.8.8.8", "seq": 0, "size": 96, "rtt": 20.47, "rttAvg": 20.47, "loss": 0}
,
{"ip": "8.8.4.4", "seq": 0, "size": 96, "rtt": 23.73, "rttAvg": 23.73, "loss": 0}
,
{"ip": "8.8.8.8", "seq": 1, "size": 96, "rtt": 21.27, "rttAvg": 21.00, "loss": 0}
,
{"ip": "8.8.4.4", "seq": 1, "size": 96, "rtt": 24,00, "rttAvg": 23.53, "loss": 0}
,
..... output omitted
]
But it could be more complex:
{
progress: [ ...array of progress... ],
results: { ...hash of results... },
options: { .. I love when programs output the options they were run with in structured output like this... }
}
fping --jsonStream | someCommandThatProcessesOneLineAtATimeWhereEachLineIsAValidJsonObject
fping --jsonArray > myValidJsonFile.json
I'm a bit busy at work at the moment so it might take some time until I can get back to this PR, sorry. :(
Thank you, @Reperator, this is really useful feature for us, talking from Zabbix perspective! I'd agree with @bkuker that it makes more sense to not list tags that have no values.
OK here are my 2 cents. I have grouped the output in targets, within those there is data for each packet and summary. Also, I have added some ideas about error handling:
{
"error": "in case of error",
"targets": {
"dns.google.com": {
"addr": "8.8.8.8",
"packets": [
{
"bytes": 64,
"rtt": 3.29,
"avg": 3.35,
"loss": 11,
"source": "4.4.4.4"
},
{
"error": "timed out"
}
],
"summary": {
"min": 21.29,
"avg": 22.29,
"max": 23.89
}
},
"7.7.7.7": {
"addr": "7.7.7.7",
"packets": [
{
"error": "timed out"
},
{
"error": "timed out"
}
],
"summary": {}
}
}
}
anyone still working on this or it was forgotten?
I'm not working on it anymore, I left my old company a year ago and don't have the time to look into this again.