lnav icon indicating copy to clipboard operation
lnav copied to clipboard

[Feature Request] Support for CloudWatch Logs / Mixed JSON logs

Open markus-geiger opened this issue 5 years ago • 5 comments

I'm trying to write a logformat for CloudWatch output like from aws logs -f groupname | lnav

Unfortunatly lnav's JSON ablities are currently very limited or not very well documented:

What I'm trying to parse is:

2021-02-04T08:02:01.678000+00:00 kube-apiserver-audit-bf58dc76fc5024c1991437332767b7d6 {\"kind\... JSONDATA}

Where I'm just interested in some fields for the line-format. Unfortunately I cannot write path-notation for JSON-fields or use sqlite like jget(body, '').

Does this improve in future?

So far what I got:

{
    "CloudWatchEKS" : {
        "title"            : "CloudWatchEKS",
        "description"    : "Log format used by EKS",
        "url"            : "http://NEW",
        "regex" : {
            "basic"    : {
                "pattern" : "^(?<timestamp>\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}?(\\.\\d{6}))((\\+|-)\\d{2}:\\d{2}) (?<type>[a-zA-Z0-9\\-]+)-(?<id>[0-9a-f]+) (?<body>{.*})$"
            }
        },
        "json" : false,
        "convert-to-local-time": true,
        "timestamp-format" : [ "%Y-%m-%dT%H:%M:%S" ,"%Y-%m-%dT%H:%M:%S:%L%z"],
        "value" : {
            "type" : {
                "kind" : "string",
                "identifier" : false
            },
            "id" : {
                "kind" : "string",
                "identifier" : true
            },
            "body" : {
                "kind" : "json"
            }
        },
        "line-format" : [
            { "field" : "__timestamp__", "timestamp-format" : "%Y-%m-%dT%H:%M:%S"  },
            " ",
            { "field" : "body" }
        ],
        "sample" : [
            {
				"line" 	: "2021-02-04T08:02:01.678000+00:00 kube-apiserver-audit-bf58dc76fc5024c1991437332767b7d6 {\"kind\":\"Event\",\"apiVersion\":\"audit.k8s.io/v1\",\"level\":\"Metadata\",\"auditID\":\"XXXXXX-939b-4892-8264-6ccf868339ce\",\"stage\":\"ResponseComplete\",\"requestURI\":\"/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-scheduler?timeout=10s\",\"verb\":\"update\",\"user\":{\"username\":\"system:kube-scheduler\",\"groups\":[\"system:authenticated\"]},\"sourceIPs\":[\"172.16.119.165\"],\"userAgent\":\"kube-scheduler/v1.18.9 (linux/amd64) kubernetes/d1db3c4/leader-election\",\"objectRef\":{\"resource\":\"leases\",\"namespace\":\"kube-system\",\"name\":\"kube-scheduler\",\"uid\":\"XXXXX-bac0-4db8-9d18-aa385a585c40\",\"apiGroup\":\"coordination.k8s.io\",\"apiVersion\":\"v1\",\"resourceVersion\":\"175737\"},\"responseStatus\":{\"metadata\":{},\"code\":200},\"requestReceivedTimestamp\":\"2021-02-04T08:02:01.452753Z\",\"stageTimestamp\":\"2021-02-04T08:02:01.458466Z\",\"annotations\":{\"authorization.k8s.io/decision\":\"allow\",\"authorization.k8s.io/reason\":\"RBAC: allowed by ClusterRoleBinding \\\"system:kube-scheduler\\\" of ClusterRole \\\"system:kube-scheduler\\\" to User \\\"system:kube-scheduler\\\"\"}}"
			}
		]
    }
}

markus-geiger avatar Feb 04 '21 09:02 markus-geiger

use sqlite like jget(body, '').

It looks like this is a problem with calling the capture "body", since that has a special meaning[0]. If you change the capture name to something else (e.g. "jbody") and update the name in "value" as well, things should work better. For example, when you press 'p', the overlay will show the jget() syntax needed to retrieve individual fields:

image

Where I'm just interested in some fields for the line-format.

Unfortunately, the line-format functionality only works for JSON-Lines log formats. They won't work for plain-text logs like this one. Leave this bug open so it can be addressed in the future.

[0] - The "body" is for a free-form text field that is automatically parsed to extract key/value pairs.

tstack avatar Feb 04 '21 17:02 tstack

Wow! Great and thanks for the comment!

I worked around the problems by converting to complete JSON via piping through jq -Rcr 'inputs|capture("^(?<timestamp>\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}?(\\.\\d{6}))((\\+|-)\\d{2}:\\d{2}) (?<type>[a-zA-Z0-9\\-]+)-(?<id>[0-9a-f]+) (?<body>{.*})$")|(.body|fromjson) as $x|del(.body)|. * $x' then things look better in pure JSON ;)

It's good to have one data structure but when I remember a past project we had a base64 encoded and encrypted state-dump within the logs. So it would be nyce to have a way to unmarshallow embedded content (via script?) as well as preprocess logs?

markus-geiger avatar Feb 04 '21 19:02 markus-geiger