fluent-logger-ruby Logging messages can block

This gem can block execution, effectively allowing a website to be DOS'd, if too many messages are sent at the same time. The TCPSocket is set to be synchronous, so the only buffering of messages is done in the kernel. But writes to the socket use write instead of write_nonblock, so if the kernel buffer fills up the write will block and the ruby thread being executed will just need to wait until data is cleared from the kernel buffer.

Compounding the issue, there is only one TCPSocket connection for all threads of a process. If a web app uses puma or some other kind of threaded model, then as soon as the kernel buffer fills up all threads will need to wait until buffer room is freed up.

To partially solve this issue, I think the write needs to be turned into a write_nonblock. The gem could then detect overflows and discard messages when that happens. It could also buffer some messages if they overflow, but that would just be a nicety.

To solve the issue further, there probably should be a separate connection per thread. That would allow for better scalability of socket buffering when comparing a single threaded process and a process with lots of threads trying to write messages. But this is also just a nicety on top.

Jul 15 '15 16:07 txase

To solve the issue further, there probably should be a separate connection per thread.

It means socket should be created on each send or stored socket on TLS? Could you show me an implementation example?

Jul 21 '15 00:07 repeatedly

@repeatedly: stored socket on TLS (thread-local-storage, at first I thought you meant TLSv1.0 like SSL :).

Something like this:

def connect!
  con = TCPSocket.new(@host, @port)
  # More setup here
  con
end

def send_data(data)
  Thread.current[:fluent_logger_connection] ||= connect!
  con = Thread.current[:fluent_logger_connection]
  con.write data
end

But again, this is extra niceness. The real issue is the blocking writes. That should be addressed first.

Jul 31 '15 22:07 txase

any update? this could be major issue if you're sending data from web server and when there is some problem with td-agent so that this logger totally block whole web server and it can easily crashed (which is the issue we already having now)

Dec 10 '15 02:12 breath103

I will investigate this issue in next week.

Dec 10 '15 05:12 okkez

I want to reproduce this issue, but I cannot reproduce this for now. I wrote very small script for this.

require "fluent-logger"

Fluent::Logger::FluentLogger.open(nil, :host=>'localhost', :port=>24224)

a = []

start = Time.now

100.times do |i|
  a << Thread.start(i) do
    puts "start#{i}"
    1000.times do
      Fluent::Logger.post("myapp.access", ENV.to_hash)
    end
    puts "end#{i}"
  end
end

a.each(&:join)

p :done
p Time.now - start

<source>
  @type forward
  port 24224
</source>

<match *.**>
  @type stdout
</match>

Dec 15 '15 09:12 okkez

@txase @breath103 Please try #39.

Dec 24 '15 04:12 okkez

@okkez thanks! sorry for late reply. i'll let you know after checking out this

Mar 21 '16 05:03 breath103

Try setting up a logger to send to "localhohoho.com", and wait, wait, wait... at the logger creation time, and every time the connection is retried again.

May 10 '16 12:05 aotenko

any updates on this? it's been an year since last update.

Jul 04 '17 13:07 bogdanRada

Here is the patch: https://github.com/fluent/fluent-logger-ruby/pull/39 We need the feedback on several environment...

Jul 04 '17 15:07 repeatedly