tarmak icon indicating copy to clipboard operation
tarmak copied to clipboard

Investigate missing logs in connected elasticsearch

Open charlieegan3 opened this issue 7 years ago • 1 comments

Is this a BUG REPORT or FEATURE REQUEST?: /kind bug

What happened: I first configured elastic search as it is described in the docs: https://docs.tarmak.io/user-guide.html#setting-up-an-aws-hosted-elasticsearch-cluster

I then set the config in my tarmak yaml (as is also described at the end of the docs section)

apiVersion: api.tarmak.io/v1alpha1
clusters:
- name: cluster
  loggingSinks:
  - types: ["all"]
    elasticsearch:
      host: ${elasticsearch_endpoint}
      tls: true
      amazonESProxy: {}
  amazon:
    additionalIAMPolicies:
    - ${elasticsearch_shipping_policy_arn}

Fluent bit seems to be having issues sending the logs over. Looks like the logs are being truncated

{"log":"{\"took\":16039,\"errors\":true,\"items\":[{\"index\":{\"_index\":\"logstash-2018.08.06\",\"_type\":\"flb_type\",\"_id\":\"ZgyXD2UBIbhXqGWXL66c\",\"status\":400,\"error\":{\"type\":\"illegal_argument_exception\",\"reason\":\"Limit of total fields [1000] in index [logstash-2018.08.06] has been exceeded\"}}},{\"index\":{\"_index\":\"logstash-2018.08.06\",\"_type\":\"flb_type\",\"_id\":\"ZwyXD2UBIbhXqGWXL66c\",\"status\":400,\"error\":{\"type\":\"illegal_argument_exception\",\"reason\":\"Limit of total fields [1000] in index [logstash-2018.08.06] has been exceeded\"}}},{\"index\":{\"_index\":\"logstash-2018.08.06\",\"_type\":\"flb_type\",\"_id\":\"aAyXD2UBIbhXqGWXL66c\",\"status\":429,\"error\":{\"type\":\"es_rejected_execution_exception\",\"reason\":\"rejected execution of org.elasticsearch.transport.TransportService$7@276e88f3 on EsThreadPoolExecutor[name = NB_Fvbw/bulk, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@347b2cc6[Running, pool size = 2, active threads = 2, queued tasks = 20\n","stream":"stderr","time":"2018-08-06T14:15:07.077562778Z"}
{"log":"{\"took\":13981,\"errors\":true,\"items\":[{\"index\":{\"_index\":\"logstash-2018.08.06\",\"_type\":\"flb_type\",\"_id\":\"nS2XD2UB_DVwLD0FP71D\",\"status\":429,\"error\":{\"type\":\"es_rejected_execution_exception\",\"reason\":\"rejected execution of org.elasticsearch.transport.TransportService$7@15a6bb46 on EsThreadPoolExecutor[name = NB_Fvbw/bulk, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@347b2cc6[Running, pool size = 2, active threads = 2, queued tasks = 200, completed tasks = 250494]]\"}}},{\"index\":{\"_index\":\"logstash-2018.08.06\",\"_type\":\"flb_type\",\"_id\":\"ni2XD2UB_DVwLD0FP71D\",\"status\":429,\"error\":{\"type\":\"es_rejected_execution_exception\",\"reason\":\"rejected execution of org.elasticsearch.transport.TransportService$7@3f32c8d2 on EsThreadPoolExecutor[name = fJancXV/bulk, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@24d4b38c[Running, pool size = 2, active threads = 2, queu\n","stream":"stderr","time":"2018-08-06T14:15:09.02842921Z"}

It's worth commenting that Limit of total fields [1000] in index perhaps likely unrelated. This figure can't be changed anyway on amazon.

This is the only person that seems to have the same issue: https://github.com/fluent/fluent-bit/issues/432

It looks like elasticsearch is meant to be producing some error logs in these cases but when I connected it to cloudwatch I found that the logs were empty.

What you expected to happen: The logs to appear in elasticsearch

How to reproduce it (as minimally and precisely as possible): as above

Environment: Single cluster, created elastic search based on the process in the docs.

charlieegan3 avatar Aug 06 '18 17:08 charlieegan3

I was working on #583 and ran fluent_bit with es. I ran into the following issue, but couldn't really pinpoint the problem.

Oct 17 14:36:05 ip-10-99-77-49.eu-west-1.compute.internal td-agent-bit[5603]: [2018/10/17 14:36:05] [error] [out_es] could not pack/validate JSON response
Oct 17 14:36:05 ip-10-99-77-49.eu-west-1.compute.internal td-agent-bit[5603]: {"took":41,"errors":true,"items":[{"index":{"_index":"test-2018.10.17","_type":"flb_type","_id":"BJp0gmYBLP9FIf5mgDFj","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":257369,"_primary_term":1,"status":201}},{"index":{"_index":"test-2018.10.17","_type":"flb_type","_id":"BZp0gmYBLP9FIf5mgDFj","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":251844,"_primary_term":1,"status":201}},{"index":{"_index":"test-2018.10.17","_type":"flb_type","_id":"Bpp0gmYBLP9FIf5mgDFj","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":251845,"_primary_term":1,"status":201}},{"index":{"_index":"test-2018.10.17","_type":"flb_type","_id":"B5p0gmYBLP9FIf5mgDFj","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":251783,"_primary_term":1,"status":201}},{"index":{"_index":"test-2018.10.17","_typ
Oct 17 14:36:05 ip-10-99-77-49.eu-west-1.compute.internal td-agent-bit[5603]: [2018/10/17 14:36:05] [ warn] [out_es] Elasticsearch error
Oct 17 14:36:05 ip-10-99-77-49.eu-west-1.compute.internal td-agent-bit[5603]: {"took":41,"errors":true,"items":[{"index":{"_index":"test-2018.10.17","_type":"flb_type","_id":"BJp0gmYBLP9FIf5mgDFj","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":257369,"_primary_term":1,"status":201}},{"index":{"_index":"test-2018.10.17","_type":"flb_type","_id":"BZp0gmYBLP9FIf5mgDFj","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":251844,"_primary_term":1,"status":201}},{"index":{"_index":"test-2018.10.17","_type":"flb_type","_id":"Bpp0gmYBLP9FIf5mgDFj","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":251845,"_primary_term":1,"status":201}},{"index":{"_index":"test-2018.10.17","_type":"flb_type","_id":"B5p0gmYBLP9FIf5mgDFj","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":251783,"_primary_term":1,"status":201}},{"index":{"_index":"test-2018.10.17","_type":"flb_type","_id

This was with a basic cluster with nothing running on it.

MattiasGees avatar Oct 17 '18 15:10 MattiasGees