columnify icon indicating copy to clipboard operation
columnify copied to clipboard

Nested array of strings issue

Open i5okie opened this issue 3 years ago • 0 comments

Using FluentD with Columnify.

Running on Kubernetes to push logs to S3 in parquet. Issue's arising when trying to use avro schema with a nested map, or list of strings.

According to kubernetes metadata filter plugin's docs (https://github.com/ViaQ/fluent-plugin-kubernetes_metadata_input/blob/master/README.md#kubernetes-labels-and-annotations) I believe that columnify would get a nested array of strings.

Got this schema to work so far, but still running into issues upstream with Athena.

{
  "type": "record",
  "name": "record",
  "fields": [
    {
      "name": "message",
      "type": "string"
    },
    {
      "name": "logtag",
      "type": "string"
    },
    {
      "name": "stream",
      "type": "string"
    },
    {
      "name": "time",
      "type": ["null", "string"]
    },
    {
      "name": "docker",
      "type": {
        "type": "record",
        "name": "docker",
        "fields": [
          {
            "name": "container_id",
            "type": "string"
          }
        ]
        }
    },
    {
      "name": "kubernetes",
      "type": {
        "type": "record",
        "name": "kubernetes",
        "fields": [
          {
            "name": "container_name",
            "type": "string"
          },
          {
            "name": "host",
            "type": ["null", "string"]
          },
          {
            "name": "master_url",
            "type": ["null", "string"]
          },
          {
            "name": "namespace_name",
            "type": ["null", "string"]
          },
          {
            "name": "pod_id",
            "type": ["null", "string"]
          },
          {
            "name": "pod_name",
            "type": ["null", "string"]
          },
          {
            "name": "labels",
            "type": {
              "type": "array",
              "items": {
                "name": "label",
                "type" : "record",
                "fields": [ {
                  "type": ["null", "string"]
                } ]
              }
            }
          }
        ]
      }
    }
  ]
}

Specifically the issue with the labels part. I think this should work, instead of the record with array of record:

{ 
  "name": "labels",
  "type":{
    "type": "array",
    "items":{
      "type":"list",
      "values":"string"
      }
  }
}

Example data before fluentd filters:

{
  "stream": "stdout",
  "logtag": "F",
  "message": "  Tue Nov 22 23:51:12 UTC 2022 Found redis master (172.20.203.160)",
  "time": 1669161072.283568,
  "docker": {
    "container_id": "29e32e64745530e7a1c5e9174f9e266e051707aec6a76d4556871532157a"
  },
  "kubernetes": {
    "container_name": "split-brain-fix",
    "namespace_name": "argocd",
    "pod_name": "argocd-redis-ha-server-0",
    "container_image": "docker.io/library/redis:6.2.6-alpine",
    "container_image_id": "docker.io/library/redis@sha256:132337b9d7744ffee4fae83fde53c3530935ad3ba528b7110f2d805f55cbf5",
    "pod_id": "ee5af2aa-14d8-446c-9755-",
    "pod_ip": "10.64.124.43",
    "host": "ip-10-64-116-85.us-west-2.compute.internal",
    "labels": {
      "app": "redis-ha",
      "argocd-redis-ha": "replica",
      "controller-revision-hash": "argocd-redis-ha-server-7cd67685d6",
      "release": "argocd",
      "statefulset_kubernetes_io/pod-name": "argocd-redis-ha-server-0"
    },
    "master_url": "https://172.20.0.1:443/api",
    "namespace_id": "f3d1453d-d227-4c54-982a-457d5b99cc8b",
    "namespace_labels": {
      "app_kubernetes_io/managed-by": "Helm",
      "kubernetes_io/metadata_name": "argocd"
    }
  },
  "tag": "kubernetes.var.log.containers.argocd-redis-ha-server-0_argocd_split-brain-fix-29e32e64745530e7a171e08251707aec6a76d4556871532157a.log"
}

But getting this error:

2022-11-23 00:29:43 +0000 [warn]: #0 [out_s3] got unrecoverable error in primary and no secondary error_class=Fluent::UnrecoverableError error="failed to execute columnify command. stdout= stderr=panic: runtime error: index out of range [0] with length 0\n\ngoroutine 1 [running]:\n │
│ github.com/xitongsys/parquet-go/layout.PagesToChunk(0x10ea6d8, 0x0, 0x0, 0x20)\n\t/home/runner/go/pkg/mod/github.com/xitongsys/[email protected]/layout/chunk.go:24 +0x90d\ngithub.com/xitongsys/parquet-go/writer.(*ParquetWriter).Flush(0xc00074fcc0, 0xc00010e001, 0x10, 0xa3abc0)\n\t/ │
│ home/runner/go/pkg/mod/github.com/xitongsys/[email protected]/writer/writer.go:285 +0x3d5\ngithub.com/xitongsys/parquet-go/writer.(*ParquetWriter).WriteStop(0xc00074fcc0, 0x0, 0xc00010e050)\n\t/home/runner/go/pkg/mod/github.com/xitongsys/[email protected]/writer/writer.go:120 +0x37 │
│ \ngithub.com/reproio/columnify/columnifier.(*parquetColumnifier).Close(0xc00000c6c0, 0xc00086fe18, 0x9d5cff)\n\t/home/runner/work/columnify/columnify/columnifier/parquet.go:122 +0x2e\nmain.columnify.func1(0xc2d760, 0xc00000c6c0, 0xc00086fec0)\n\t/home/runner/work/columnify/columnif │
│ y/cmd/columnify/columnify.go:24 +0x35\nmain.columnify(0xc2d760, 0xc00000c6c0, 0xc00013a0f0, 0x1, 0x1, 0x0, 0x0)\n\t/home/runner/work/columnify/columnify/cmd/columnify/columnify.go:36 +0xe2\nmain.main()\n\t/home/runner/work/columnify/columnify/cmd/columnify/columnify.go:71 +0x545\n  │
│ status=#<Process::Status: pid 48 exit 2>"                                                                                                                                                                                                                                                  │
│   2022-11-23 00:29:43 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.1.0/gems/fluent-plugin-s3-1.7.2/lib/fluent/plugin/s3_compressor_parquet.rb:60:in `compress'                                                                                                                           │
│   2022-11-23 00:29:43 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.1.0/gems/fluent-plugin-s3-1.7.2/lib/fluent/plugin/out_s3.rb:352:in `write'                                                                                                                                            │
│   2022-11-23 00:29:43 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.1.0/gems/fluentd-1.15.3/lib/fluent/plugin/output.rb:1180:in `try_flush'                                                                                                                                               │
│   2022-11-23 00:29:43 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.1.0/gems/fluentd-1.15.3/lib/fluent/plugin/output.rb:1501:in `flush_thread_run'                                                                                                                                        │
│   2022-11-23 00:29:43 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.1.0/gems/fluentd-1.15.3/lib/fluent/plugin/output.rb:501:in `block (2 levels) in start'                                                                                                                                │
│   2022-11-23 00:29:43 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.1.0/gems/fluentd-1.15.3/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'

i5okie avatar Nov 23 '22 19:11 i5okie