fluentd icon indicating copy to clipboard operation
fluentd copied to clipboard

Some logs are missing

Open vascoosx opened this issue 6 years ago • 5 comments

Describe the bug

In my current setup logs go to papertrail by syslog+tls and to a gcp instance by https which then goes to stackdriver. Now, some logs that are present in papertrail can't be found in stackdriver

To Reproduce

Logs are initially sent through heroku's logdrain. The logs first go to an nginx server working as a proxy then to fluentd which sends it to stackdriver.

Expected behavior

Every log that is in papertrail should be in stackdriver

Your Configuration

client setting:

<source>
  @type http
  tag <tag name>
  <parse>
    @type regexp
    expression /^.*<\S+>\d (?<time>\S+) host app web.1 - (?<severity>.), (?<message>.*)$/
  </parse>
  port <port>
  bind 0.0.0.0
  add_remote_addr https://<url>
</source>

Your Error Log

# /var/log/google-fluentd/google-fluentd.log

2019-09-01 06:25:01 +0000 [info]: #0 flushing all buffer forcedly
2019-09-01 06:25:01 +0000 [info]: #0 detected rotation of /var/log/nginx/access.log; waiting 5 seconds
2019-09-01 06:25:01 +0000 [info]: #0 following tail of /var/log/nginx/access.log
2019-09-01 06:25:01 +0000 [info]: #0 detected rotation of /var/log/syslog; waiting 5 seconds
2019-09-01 06:25:01 +0000 [info]: #0 following tail of /var/log/syslog

(no errors were found in nginx)

Additional context

agent version: google-fluentd 1.4.2 OS: Ubuntu 18.04

The text below is a portion of the logs. Asterix denote the logs that were missing

05:54:11.468349
05:54:11.474820
05:54:11.477478 *
05:54:11.481780 *
05:54:11.484050 *
05:54:11.485974 *
05:54:11.488010 *
05:54:11.491051 *
05:54:11.492902 *
05:54:11.495263 *
05:54:11.497550 *
05:54:11.498517 *
05:54:11.499052 *
05:54:12.163430
05:54:12.272951
05:54:12.298832 * 
05:54:12.304858 *
05:54:12.307521 *
05:54:12.309893 *
05:54:12.310037 *
05:54:12.311776 *
05:54:12.313578 *
05:54:12.315410 *
05:54:12.317899 *
05:54:12.319555 *
05:54:12.321456 *
05:54:12.323302 *
05:54:12.323988 *
05:54:12.324458 *
05:54:12.796234
05:54:12.916607

vascoosx avatar Sep 02 '19 04:09 vascoosx

Is this fluentd core bug? The logs are lost inside fluentd or 3rd party plugin?

repeatedly avatar Sep 02 '19 04:09 repeatedly

We can't setup gcp or other cloud service. Could you reproduce the issue on simple environment, e.g. one linux server?

repeatedly avatar Sep 02 '19 04:09 repeatedly

Thank you. I'll try reproducing it with a simpler environment. Meanwhile may you tell me if there are any spec on the maximum throughput for http source? Seems like wherever the issue stems from, it is a load related issue.

vascoosx avatar Sep 02 '19 06:09 vascoosx

Meanwhile may you tell me if there are any spec on the maximum throughput for http source?

I'm not sure because it depends on machin spec, format and more... Official article mentions one example: https://docs.fluentd.org/input/http#handle-large-data-with-batch-mode

repeatedly avatar Sep 05 '19 21:09 repeatedly

We have a similar problem when using the splunk_hec plugin to forward messages to an external splunk installation via the splunk heavy forwarder.

We have noticed that when the problem manifests, we see this error in the fluent log:

2019-08-20 14:05:44 +0000 [info]: Worker 0 finished unexpectedly with signal SIGKILL

If the worker is killed, I suspect all the of messages that were in the queue are lost. Is this a correct assumption? We're not currently configured to handle overflow conditions (by backing to a file for example). We lost three days worth of messages that had yet to be funneled over to splunk when this happened.

Looking for clarification to help determine if it's fluentd, or the plugin that is problematic.

vguaglione avatar Sep 17 '19 14:09 vguaglione

Sorry for the delay.

@vguaglione

If the worker is killed, I suspect all the of messages that were in the queue are lost. Is this a correct assumption?

We can use file buffer. Log loss due to forced process killed cannot be completely prevented, but it can be minimized.

daipom avatar Apr 28 '23 01:04 daipom

@vascoosx I will close this issue as there will be no update for a while.

If you are still experiencing this problem and know anything about how to reproduce it, please re-open.

daipom avatar Apr 28 '23 01:04 daipom