fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

Data loss with s3 plugin in k8s 1.21

Open lecaros opened this issue 3 years ago • 1 comments

Bug Report

Describe the bug While running 1.8.15 or 1.9.2 in k8s with a considerable load, some data loss is observed. Some error messages are:

[2022/04/26 23:42:28] [error] [output:s3:s3.0] error writing tag metadata
[2022/04/26 23:42:28] [ warn] [output:s3:s3.0] Deleting buffer file because metadata could not be written
[2022/04/26 23:42:28] [error] [fstore] [cio file] error deleting file at close 2022-04-26T22:01:44:17192556665469764707-11341775979779721892
[2022/04/26 23:42:28] [ warn] [output:s3:s3.0] Could not buffer chunk. Data order preservation will be compromised
[2022/04/26 23:42:28] [error] [output:s3:s3.0] Could not marshal msgpack to output string
[2022/04/26 23:42:28] [ warn] [engine] failed to flush chunk '1-1651016113.556939461.flb', retry in 8 seconds: task_id=54, input=tail.0 > output=s3.0 (out_id=0)
[2022/04/26 23:42:28] [ warn] [engine] failed to flush chunk '1-1651016113.560748591.flb', retry in 6 seconds: task_id=63, input=tail.0 > output=s3.0 (out_id=0)
[2022/04/26 23:42:28] [error] [fstore] could not write metadata to file: 2022-04-26T22:01:44:2279422567574872811-2391295758604225780
[2022/04/26 23:42:28] [error] [output:s3:s3.0] error writing tag metadata

The Could not marshal msgpack to output string message is directly related to dropped records (curl to /api/v1/metrics endpoint)

To Reproduce

  • Steps to reproduce the problem:

Steps to reproduce are in README.md inside the provided zip file.

s3-marshal.zip

Expected behavior Should not loss data.

Your Environment

  • Version used: 1.8.15 and 1.9.2
  • Configuration: see attached
  • Environment name and version (e.g. Kubernetes? What version?): k8s 1.21

lecaros avatar Apr 27 '22 22:04 lecaros

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

github-actions[bot] avatar Aug 08 '22 02:08 github-actions[bot]

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

github-actions[bot] avatar Dec 07 '22 02:12 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

github-actions[bot] avatar Dec 13 '22 02:12 github-actions[bot]