[Feature Request] Support for CloudWatch Logs / Mixed JSON logs
I'm trying to write a logformat for CloudWatch output like from aws logs -f groupname | lnav
Unfortunatly lnav's JSON ablities are currently very limited or not very well documented:
What I'm trying to parse is:
2021-02-04T08:02:01.678000+00:00 kube-apiserver-audit-bf58dc76fc5024c1991437332767b7d6 {\"kind\... JSONDATA}
Where I'm just interested in some fields for the line-format. Unfortunately I cannot write path-notation for JSON-fields or use sqlite like jget(body, '').
Does this improve in future?
So far what I got:
{
"CloudWatchEKS" : {
"title" : "CloudWatchEKS",
"description" : "Log format used by EKS",
"url" : "http://NEW",
"regex" : {
"basic" : {
"pattern" : "^(?<timestamp>\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}?(\\.\\d{6}))((\\+|-)\\d{2}:\\d{2}) (?<type>[a-zA-Z0-9\\-]+)-(?<id>[0-9a-f]+) (?<body>{.*})$"
}
},
"json" : false,
"convert-to-local-time": true,
"timestamp-format" : [ "%Y-%m-%dT%H:%M:%S" ,"%Y-%m-%dT%H:%M:%S:%L%z"],
"value" : {
"type" : {
"kind" : "string",
"identifier" : false
},
"id" : {
"kind" : "string",
"identifier" : true
},
"body" : {
"kind" : "json"
}
},
"line-format" : [
{ "field" : "__timestamp__", "timestamp-format" : "%Y-%m-%dT%H:%M:%S" },
" ",
{ "field" : "body" }
],
"sample" : [
{
"line" : "2021-02-04T08:02:01.678000+00:00 kube-apiserver-audit-bf58dc76fc5024c1991437332767b7d6 {\"kind\":\"Event\",\"apiVersion\":\"audit.k8s.io/v1\",\"level\":\"Metadata\",\"auditID\":\"XXXXXX-939b-4892-8264-6ccf868339ce\",\"stage\":\"ResponseComplete\",\"requestURI\":\"/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-scheduler?timeout=10s\",\"verb\":\"update\",\"user\":{\"username\":\"system:kube-scheduler\",\"groups\":[\"system:authenticated\"]},\"sourceIPs\":[\"172.16.119.165\"],\"userAgent\":\"kube-scheduler/v1.18.9 (linux/amd64) kubernetes/d1db3c4/leader-election\",\"objectRef\":{\"resource\":\"leases\",\"namespace\":\"kube-system\",\"name\":\"kube-scheduler\",\"uid\":\"XXXXX-bac0-4db8-9d18-aa385a585c40\",\"apiGroup\":\"coordination.k8s.io\",\"apiVersion\":\"v1\",\"resourceVersion\":\"175737\"},\"responseStatus\":{\"metadata\":{},\"code\":200},\"requestReceivedTimestamp\":\"2021-02-04T08:02:01.452753Z\",\"stageTimestamp\":\"2021-02-04T08:02:01.458466Z\",\"annotations\":{\"authorization.k8s.io/decision\":\"allow\",\"authorization.k8s.io/reason\":\"RBAC: allowed by ClusterRoleBinding \\\"system:kube-scheduler\\\" of ClusterRole \\\"system:kube-scheduler\\\" to User \\\"system:kube-scheduler\\\"\"}}"
}
]
}
}
use sqlite like jget(body, '').
It looks like this is a problem with calling the capture "body", since that has a special meaning[0]. If you change the capture name to something else (e.g. "jbody") and update the name in "value" as well, things should work better. For example, when you press 'p', the overlay will show the jget() syntax needed to retrieve individual fields:

Where I'm just interested in some fields for the line-format.
Unfortunately, the line-format functionality only works for JSON-Lines log formats. They won't work for plain-text logs like this one. Leave this bug open so it can be addressed in the future.
[0] - The "body" is for a free-form text field that is automatically parsed to extract key/value pairs.
Wow! Great and thanks for the comment!
I worked around the problems by converting to complete JSON via piping through jq -Rcr 'inputs|capture("^(?<timestamp>\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}?(\\.\\d{6}))((\\+|-)\\d{2}:\\d{2}) (?<type>[a-zA-Z0-9\\-]+)-(?<id>[0-9a-f]+) (?<body>{.*})$")|(.body|fromjson) as $x|del(.body)|. * $x' then things look better in pure JSON ;)
It's good to have one data structure but when I remember a past project we had a base64 encoded and encrypted state-dump within the logs. So it would be nyce to have a way to unmarshallow embedded content (via script?) as well as preprocess logs?