fluent-plugin-opensearch icon indicating copy to clipboard operation
fluent-plugin-opensearch copied to clipboard

Could not communicate to OpenSearch, resetting connection and trying again. [404]

Open kentan88 opened this issue 1 year ago • 4 comments

Steps to replicate

Provide example config and message Dockerfile

# Use the fluentd base image
FROM fluent/fluentd:v1.15-debian-1

USER root

# Install necessary dependencies
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
    build-essential \
    curl \
    && rm -rf /var/lib/apt/lists/*

RUN gem install faraday-net_http multi_json aws-eventstream faraday aws-sigv4 opensearch-ruby faraday_middleware-aws-sigv4 fluent-plugin-opensearch excon faraday-excon jmespath aws-partitions aws-sdk-core fluent-plugin-opensearch

# Switch back to fluent user
USER fluent

# Copy the configuration file to the Fluentd configuration directory
COPY ./config/fluent-opensearch.conf /fluentd/etc/fluent.conf

# Expose port for Fluentd
EXPOSE 24224

# Run Fluentd with the configuration file
# (often located at /etc/fluent/fluent.conf or /etc/td-agent/td-agent.conf). Add an output section with the OpenSearch configuration.
CMD ["fluentd", "-c", "/fluentd/etc/fluent.conf"]

fluent.conf

<match es.**>
  @type opensearch
  logstash_format true
  include_tag_key true
  flush_interval 1s

  <endpoint>
    url https://xxxxx.ap-southeast-1.aoss.amazonaws.com
    region ap-southeast-1
    access_key_id XXXXXXXXXXXX
    secret_access_key XXXXXXXXXXXX
    aws_service_name aoss
  </endpoint>
</match>

Expected Behavior or What you need to ask

I'm running a local Docker which uses fluent/fluentd:v1.15-debian-1 as the base image. When I ran the container, i'm getting the following message:

2024-02-20 07:52:23 +0000 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
2024-02-20 07:52:23 +0000 [info]: parsing config file is succeeded path="/fluentd/etc/fluent.conf"
2024-02-20 07:52:23 +0000 [info]: gem 'fluentd' version '1.15.3'
2024-02-20 07:52:23 +0000 [info]: gem 'fluent-plugin-opensearch' version '1.1.4'
2024-02-20 07:52:23 +0000 [info]: using configuration file: <ROOT>
  <match es.**>
    @type opensearch
    <endpoint>
      url https://XXXXXXXXXXXX.ap-southeast-1.aoss.amazonaws.com/
      region "ap-southeast-1"
      access_key_id "XXXXXXXXXXXX"
      secret_access_key xxxxxx
      aws_service_name aoss
    </endpoint>
  </match>
</ROOT>
2024-02-20 07:52:23 +0000 [info]: starting fluentd-1.15.3 pid=7 ruby="3.1.3"
2024-02-20 07:52:23 +0000 [info]: spawn command to main:  cmdline=["/usr/local/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/local/bundle/bin/fluentd", "-c", "/fluentd/etc/fluent.conf", "--plugin", "/fluentd/plugins", "--under-supervisor"]
2024-02-20 07:52:23 +0000 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
2024-02-20 07:52:24 +0000 [info]: #0 init worker0 logger path=nil rotate_age=nil rotate_size=nil
2024-02-20 07:52:24 +0000 [info]: adding match pattern="es.**" type="opensearch"
2024-02-20 07:52:26 +0000 [warn]: #0 Could not communicate to OpenSearch, resetting connection and trying again. [404]
2024-02-20 07:52:26 +0000 [warn]: #0 Remaining retry: 14. Retry to communicate after 2 second(s).
2024-02-20 07:52:30 +0000 [warn]: #0 Could not communicate to OpenSearch, resetting connection and trying again. [404]
2024-02-20 07:52:30 +0000 [warn]: #0 Remaining retry: 13. Retry to communicate after 4 second(s).
2024-02-20 07:52:38 +0000 [warn]: #0 Could not communicate to OpenSearch, resetting connection and trying again. [404]
2024-02-20 07:52:38 +0000 [warn]: #0 Remaining retry: 12. Retry to communicate after 8 second(s).

I can confirm that the AWS credentials and AWS OpenSearch Serverless endpoint are correct and also reachable as I was able to send data using a ruby OpenSearch client.

Any help would be much appreciated. ...

Using Fluentd and OpenSearch plugin versions

  • OS version fluent/fluentd:v1.15-debian-1
  • Docker
  • Fluentd v1.15.3
  • OpenSearch plugin version 1.1.4

kentan88 avatar Feb 20 '24 08:02 kentan88

Having the same problem with OpenSearch K8s operator and I have to restart fluentd daemon set to fix the problem every time.

mhkarimi1383 avatar Apr 23 '24 17:04 mhkarimi1383

@kentan88

Have you tried setting reload_on_failure to true? I saw this option in README, I will test it and I think this will resolve the issue :)

mhkarimi1383 avatar Apr 25 '24 08:04 mhkarimi1383

setting reload_on_failure to true did not fixed the problem

mhkarimi1383 avatar Apr 27 '24 20:04 mhkarimi1383


livenessProbe:
  httpGet: null
  initialDelaySeconds: 5
  periodSeconds: 10
  timeoutSeconds: 5
  exec:
    command:
      - bash
      - -c
      - >
        set -ex;
        curl -s http://localhost:24231/metrics
        | grep -E "fluentd_output_status_retry_wait|fluentd_output_status_num_errors|fluentd_output_status_retry_count" 
        | grep -Ev "# HELP|# TYPE"
        | grep -v "0.0"
        | wc -l | grep 0

I have added these values into the daemonset helm chart it should restart containers when retry or error happens

(Do not forget to install curl in your docker image)

mhkarimi1383 avatar May 01 '24 16:05 mhkarimi1383